FLATLANDS 2011

Workshop on natural language processing research for postgraduate students at Cambridge, Essex, Open, and Oxford Universities

Thursday, 30th June, 2011 at The Open University, Walton Hall, Milton Keynes, UK

Call for Presentations and Participation

The Flatlands workshop is an annual meeting of the NLP groups at Cambridge, Essex, Open and Oxford universities. It is an opportunity for these communities to meet and learn about recent work by research students at the four sites (through presentations). We encourage postgraduate research students to give short presentations (15-20 minutes) on topics relating to NLP (natural language understanding, natural language generation, dialogue, text mining, information extraction, etc.) We welcome participation from postgraduate students in NLP at the four institutions, their supervisors and colleagues.

Previous meetings have been held at Cambridge (2005) and Oxford (2006), London (2007), Essex (2008), Cambridge (2009), and Oxford (2010). This year's Flatlands meeting will take place at the OU campus in Milton Keynes on Thursday the 30th of June.

The workshop is free but please let us know if you plan to attend. If you would like to present your research, send us your name, affiliation, the title of your talk and a short abstract (100-150 words), by 17th June. Presentations should focus on novel aspects of your research. Seek advice from your supervisors on suitable topics for presentation.

Please contact the organisers if you have any queries.

View Larger Map

Information

Date

Thursday, 30th June 2011

Deadline

Friday, 17th June 2011

Venue

Meeting Room 1, Jennie Lee Building, The Open University, Walton Hall, Milton Keynes, MK7 6AA

Presenters

Please e-mail your name, affiliation, title and a short abstract (100-150 words) to the organisers by 17th June.

Attendees

Please e-mail the organisers by 17th June if you want to attend.

Schedule

Programme

Travel directions

Directions, maps and campus map. On arrival, please report to the OU Reception in the Berril Building. If you are coming by car, please let us know so that we can arrange a parking permit.

Presentations

Suma Adindla

University of Essex

NLP and IR for Intranet Search

Natural language processing (NLP) is becoming much more robust and applicable in realistic applications. One area in which NLP has still not been fully exploited is information retrieval (IR). In particular we are interested in search over intranets and other local Web sites. We see dialogue-driven search which is based on a largely automated knowledge extraction process as one of the next big steps. Instead of replying with a set of documents for a user query the system would allow the user to navigate through the extracted knowledge base by making use of a simple dialogue manager. Here we support this idea with a first task-based evaluation that we conducted on a university intranet. We automatically extracted entities like person names, organizations and locations as well as relations between entities and added visual graphs to the search results whenever a user query could be mapped into this knowledge base. We found that users are willing to interact and use those visual interfaces. We also found that users prefered such a system that guides a user through the result set over a baseline approach. The results represent an important first step towards full NLP-driven intranet search.

Jan Botha

University of Oxford

Language Models that Dine on Morphemes in a Chinese Restaurant

Statistical language models are crucial in ensuring that statistical machine translation systems generate more fluent translations. However, standard n-gram language models do not model word morphology effectively. Especially for morphologically rich languages, the failure to leverage morphological information implies that, at testing time, those models give less robust probability estimates for rare words and respond to words outside the training vocabulary in a naive way. We seek to address these problems by extending an existing hierarchical Bayesian language model to include an explicit account of morphology. We present preliminary results from an intrinsic evaluation task.

Ching-Yun Chang

University of Cambridge

Practical Linguistic Steganography using Synonym Substitution

Linguistic Steganography is concerned with hiding information in a natural language text, for the purposes of sending secret messages.
Linguistic Steganography algorithms hide information by manipulating properties of the text, for example by replacing some words with their synonyms. Unlike image-based steganography, linguistic steganography is in its infancy with little existing work. In this talk I will motivate the problem, in particular as an interesting application for NLP and especially natural language generation. Linguistic steganography is a difficult NLP problem because any change to the cover text must retain the meaning and style of the original, in order to prevent detection by an adversary.
Our method embeds information in the cover text by replacing words in the text with appropriate substitutes. We use a large database of word sequences collected from the Web (the Google n-gram data) to determine if a substitution is acceptable, obtaining promising results from an evaluation in which human judges are asked to rate the acceptability of modified sentences.

Mahmoud El-Haj

University of Essex

Creating an Arabic Multi-Document Summaries Corpus

In this presentation we talk about our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In the presentation we first address the lack of Arabic multi-document summaries corpora in addition to the absence of automatic and manual Arabic gold standard summaries required to evaluate any system summaries generated by Arabic summarisers. Secondly, we demonstrate the use of Google translate in creating an Arabic version of the DUC2002 dataset. The parallel Arabic/English dataset will be summarised using the Arabic and English summarisation systems. ROUGE metric will be used to automatically evaluate the summaries.

Karl Moritz Hermann

University of Oxford

Extracting and Resolving Fused-Head Noun Phrases

Fused-head noun phrases (FHNP) are a form of anaphoric instances where the head of a noun phrase has been fused with its modifier. I will describe an approach for recognising instances of FHNP in parsed text, and for resolving these by first paraphrasing with a pronoun, and then using an existing pronoun coreference resolution model. The initial FHNP-recognition system is rule-based and achieves an 81.2% F1-score on preliminary tests. The proposed method for resolving the anaphoric element of FHNP achieves an F1-score of 62.8%.

Sharon Moyo

The Open University

Effective Tutoring with Affective Embodied Conversational Agents

This natural language generation project aims to investigate the impact of affect expression using embodied conversational agents (ECAs) in computer-based learning environments. Based on the idea that there is a link between emotions and learning, we are developing an affect expression strategy. Current research has not firmly established the impact of affect expression strategies within tutorial feedback which supports learners in computer-based learning environments. Our approach is to provide affective support through empathy. We are conducting a series of studies to investigate the impact on learners. The first evaluation confirms that using speech, facial expression and gesture can generate recognisable empathic ECA expressions. Our second study suggests that although there is no overall effect on all learners, girls and high and middle ability learners may benefit from empathic interventions. We intend to improve our implementation continue to develop a framework on the impact of empathic feedback strategies in tutoring systems.

Tu Anh T. Nguyen

The Open University

Accessible Explanations for Entailments in OWL Ontologies

For debugging OWL-DL ontologies, natural language explanations of inconsistencies and undesirable entailments are of great help. From such explanations, ontology developers can learn why an ontology gives rise to specific entailments. Unfortunately, commonly used tableaux-based reasoning services do not provide a basis for such explanations, since they rely on a refutation proof strategy and normalising transformations that are difficult for humans to understand. For this reason, we investigate the use of automatically generated justifications for entailments (i.e., minimal sets of axioms from the ontology that cause entailments to hold) as a basis for generating such explanations.

Brian Plüss

The Open University

Conversational Games, Discourse Obligations and Non-Cooperative Dialogue

We present ongoing research on modelling dialogue management for non-cooperative conversational agents. We start by describing our understanding of non-cooperative linguistic behaviour in dialogue. Then, we discuss conversational games and discourse obligations, paying special attention to their limitations for addressing such behaviour.
Finally, we propose a way to combine these approaches in order to model non-cooperation, and suggest an implementation.

Sharhida Zawani Saad

University of Essex

Applying Web Usage Mining for Adaptive Intranet Navigation

Much progress has recently been made in assisting a user in the search process, be it Web search where the big search engines have now all incorporated more interactive features or be it online shopping where customers are commonly recommended items that appear to match the customer's interest. Surprisingly little progress has however been made in making navigation of a Web site more adaptive. Web sites can be difficult to navigate as they tend to be rather static and a new user has no idea what documents are most relevant to his or her need. We try to assist a new user by exploiting the navigation behaviour of previous users. On a university Web site for example, the target users change constantly. What we propose is to exploit the navigation behaviour of existing users so that we can make the Web site more adaptive by introducing links and suggestions to commonly visited pages without changing the actual Web site. This work reports on a task-based evaluation that demonstrates that the idea is very effective. Introducing suggestions as outlined above was found to be not just preferred by the users of our study but allowed them also to get to the results more quickly.

Hassan Saif

The Open University

Sentiment Analysis from Short Textual Data

With the vast spread of social networks and microblogging websites, social media tools are now considered as the most important channels of data exchange over the Internet. Facebook and Twitter become the top two most popular social networking sites. Millions of status updates and tweet messages, which reflect people's opinions and attitudes, are created and sent everyday. While Facebook has a limit of 420 characters for status update, Twitter has a 140-character limit. This poses a huge challenge on how to efficiently extract opinions and attitudes from the sheer amount of short textual data. In this work, we highlight challenges faced when dealing with short textual data and propose a few potentially feasible approaches to tackle them.


Organisers

Dr Sandra Williams, Dr Paul Piwek, Dr Richard Power, and the Open University NLG Group.


Accessibility