Suma AdindlaUniversity of Essex |
NLP and IR for Intranet Search
Natural language processing (NLP) is becoming much more robust and applicable in realistic applications. One area in which NLP has still not been fully exploited is information retrieval (IR). In particular we are interested in search over intranets and other local Web sites. We see dialogue-driven search which is based on a largely automated knowledge extraction process as one of the next big steps. Instead of replying with a set of documents for a user query the system would allow the user to navigate through the extracted knowledge base by making use of a simple dialogue manager. Here we support this idea with a first task-based evaluation that we conducted on a university intranet. We automatically extracted entities like person names, organizations and locations as well as relations between entities and added visual graphs to the search results whenever a user query could be mapped into this knowledge base. We found that users are willing to interact and use those visual interfaces. We also found that users prefered such a system that guides a user through the result set over a baseline approach. The results represent an important first step towards full NLP-driven intranet search.
|
Jan BothaUniversity of Oxford |
Language Models that Dine on Morphemes in a Chinese Restaurant
Statistical language models are crucial in ensuring that statistical machine translation systems generate more fluent translations. However, standard n-gram language models do not model word morphology effectively. Especially for morphologically rich languages, the failure to leverage morphological information implies that, at testing time, those models give less robust probability estimates for rare words and respond to words outside the training vocabulary in a naive way.
We seek to address these problems by extending an existing hierarchical Bayesian language model to include an explicit account of morphology. We present preliminary results from an intrinsic evaluation task.
|
Ching-Yun ChangUniversity of Cambridge |
Practical Linguistic Steganography using Synonym SubstitutionLinguistic Steganography is concerned with hiding information in a natural language text, for the purposes of sending secret messages.
Linguistic Steganography algorithms hide information by manipulating properties of the text, for example by replacing some words with their synonyms. Unlike image-based steganography, linguistic steganography is in its infancy with little existing work. In this talk I will motivate the problem, in particular as an interesting application for NLP and especially natural language generation. Linguistic steganography is a difficult NLP problem because any change to the cover text must retain the meaning and style of the original, in order to prevent detection by an adversary.
Our method embeds information in the cover text by replacing words in the text with appropriate substitutes. We use a large database of word sequences collected from the Web (the Google n-gram data) to determine if a substitution is acceptable, obtaining promising results from an evaluation in which human judges are asked to rate the acceptability of modified sentences.
|
Mahmoud El-HajUniversity of Essex |
Creating an Arabic Multi-Document Summaries Corpus
In this presentation we talk about our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In the presentation we first address the lack of Arabic multi-document summaries corpora in addition to the absence of automatic and manual Arabic gold standard summaries required to evaluate any system summaries generated by Arabic summarisers. Secondly, we demonstrate the use of Google translate in creating an Arabic version of the DUC2002 dataset. The parallel Arabic/English dataset will be summarised using the Arabic and English summarisation systems. ROUGE metric will be used to automatically evaluate the summaries.
|
Karl Moritz HermannUniversity of Oxford |
Extracting and Resolving Fused-Head Noun Phrases
Fused-head noun phrases (FHNP) are a form of anaphoric instances where the head of a noun phrase has been fused with its modifier. I will describe an approach for recognising instances of FHNP in parsed text, and for resolving these by first paraphrasing with a pronoun, and then using an existing pronoun coreference resolution model. The initial FHNP-recognition system is rule-based and achieves an 81.2% F1-score on preliminary tests. The proposed method for resolving the anaphoric element of FHNP achieves an F1-score of 62.8%.
|
Sharon MoyoThe Open University |
Effective Tutoring with Affective Embodied Conversational Agents
This natural language generation project aims to investigate the impact of affect
expression using embodied conversational agents (ECAs) in computer-based learning
environments. Based on the idea that there is a link between emotions and learning,
we are developing an affect expression strategy. Current research has not firmly
established the impact of affect expression strategies within tutorial feedback which
supports learners in computer-based learning environments. Our approach is to
provide affective support through empathy. We are conducting a series of studies to
investigate the impact on learners. The first evaluation confirms that using speech,
facial expression and gesture can generate recognisable empathic ECA expressions.
Our second study suggests that although there is no overall effect on all learners,
girls and high and middle ability learners may benefit from empathic interventions.
We intend to improve our implementation continue to develop a framework on the
impact of empathic feedback strategies in tutoring systems.
|
Tu Anh T. NguyenThe Open University |
Accessible Explanations for Entailments in OWL Ontologies
For debugging OWL-DL ontologies, natural language explanations of inconsistencies and
undesirable entailments are of great help. From such explanations, ontology developers can
learn why an ontology gives rise to specific entailments. Unfortunately, commonly used
tableaux-based reasoning services do not provide a basis for such explanations, since they
rely on a refutation proof strategy and normalising transformations that are difficult for
humans to understand. For this reason, we investigate the use of automatically generated
justifications for entailments (i.e., minimal sets of axioms from the ontology that cause
entailments to hold) as a basis for generating such explanations.
|
Brian PlüssThe Open University |
Conversational Games, Discourse Obligations and Non-Cooperative Dialogue
We present ongoing research on modelling dialogue management for non-cooperative conversational agents. We start by describing our understanding of non-cooperative linguistic behaviour in dialogue. Then, we discuss conversational games and discourse obligations, paying special attention to their limitations for addressing such behaviour.
Finally, we propose a way to combine these approaches in order to model non-cooperation, and suggest an implementation.
|
Sharhida Zawani SaadUniversity of Essex |
Applying Web Usage Mining for Adaptive Intranet Navigation
Much progress has recently been made in assisting a user in the search process, be it Web search where the big search engines have now all incorporated more interactive features or be it online shopping where customers are commonly recommended items that appear to match the customer's interest. Surprisingly little progress has however been made in making navigation of a Web site more adaptive. Web sites can be difficult to navigate as they tend to be rather static and a new user has no idea what documents are most relevant to his or her need. We try to assist a new user by exploiting the navigation behaviour of previous users. On a university Web site for example, the target users change constantly. What we propose is to exploit the navigation behaviour of existing users so that we can make the Web site more adaptive by introducing links and suggestions to commonly visited pages without changing the actual Web site. This work reports on a task-based evaluation that demonstrates that the idea is very effective. Introducing suggestions as outlined above was found to be not just preferred by the users of our study but allowed them also to get to the results more quickly.
|
Hassan SaifThe Open University |
Sentiment Analysis from Short Textual Data
With the vast spread of social networks and microblogging websites, social media tools are now considered as the most important channels of data exchange over the Internet. Facebook and Twitter become the top two most popular social networking sites. Millions of status updates and tweet messages, which reflect people's opinions and attitudes, are created and sent everyday. While Facebook has a limit of 420 characters for status update, Twitter has a 140-character limit. This poses a huge challenge on how to efficiently extract opinions and attitudes from the sheer amount of short textual data. In this work, we highlight challenges faced when dealing with short textual data and propose a few potentially feasible approaches to tackle them.
|