Skip to content

Toggle service links
NLG Group Logo

About US

The Natural Language Generation group at The Open University's Centre for Research in Computing is a team of computer scientists and computational linguists working together to develop theories and technologies to support the automatic generation of natural language. Broadly speaking, our work focuses on what we call 'Flexible Information Presentation': generating presentations that are appropriate to the context in which they occur. In particular, we are interested in achieving flexibility with respect to:

  • media: incorporating diagrams, pictures, sound and film, rather than being limited to text
  • genre: varying the type of text to include – e.g., dialogue as well as monologue, personal letters as well as reports or leaflets or technical summaries, etc.
  • audience: tailoring documents to specific types of situations and readership (e.g., health reports for patients vs their doctors), both as regards what to say and how to say it.
  • embodiment: integrating language-use with other kinds of behaviour, using representation either through virtual agents or physical robots.
  • style: formulating rules that can enumerate all ways of expressing a meaning (given defined resources), and choosing among them by applying stylistic preferences

Our work is supported by funding from national research councils, European research agencies, and industry.

Meetings

We meet every Thursday at 3pm, in Meeting Room 11, 2nd floor, Jennie Lee Building.

Projects

CODA

The aim of the EPRSC-funded CODA project (start date: July 1, 2009) is to develop the theory and technology for automatic transformation of text in monologue form to information-delivering dialogue, specifically dialogue between a 'layman' (e.g., patient or student) and an 'expert' (e.g., doctor or tutor). This work is currently being continued with Dr Stonyanchev at Columbia, Prof. Prendinger at NII (Tokyo) and Mr. Kuyten and Prof Ishizuka at the University of Tokyo. (Contact: Dr Paul Piwek)

DataMIX

The NESTA-funded DataMIX collaborative project with six UK Universities aiming to explore inclusive means of communicating data. (Contact: Dr Paul Piwek)

ArguEd

The eSTEeM Argumentation Education project (ArguEd) focuses on evidence-based improvement of Virtual Learning Environment-based formative assessment of students' argument analysis and evaluation skills. (Contact: Dr Paul Piwek)

MREDI

The Multimodal Reference in Dialogue project (MREDI) is a collaboration with Dr van der Sluis at Groningen University, Albert Gatt at the University of Malta and Adrian Bangerter at the University of Neuchatel. (Contact: Dr Paul Piwek)

SWAT

SWAT (Semantic Web Authoring Tool), a joint project with the School of Computer Science at the University of Manchester, is funded by the EPSRC. Its aim is to develop an NLG-based editing tool for the Semantic Web. An obstacle to realising the goals of the Semantic Web is that it depends technically on standardised formalisms for representing ontologies and factual data (OWL, RDF) that are inaccessible to most users. SWAT will establish principles for mapping between these formalisms and natural languages (e.g., English), so allowing ontologies and data to be defined and viewed through a transparent medium: interactive texts, generated automatically from the underlying encodings in OWL and RDF.

The work is supported by an advisory board of researchers currently working on Semantic Web applications, with representatives from the World Health Organisation, Siemens, Ordnance Survey, the National Health Service, the Stanford Center for Biomedical Informatics Research, and the MyGrid project. Demonstrator applications will be developed in the domains of data workflow management, medical orders, and travel. The research draws on Manchester's expertise in the theory and application of ontologies, and the OU's experience in the development of knowledge-editing tools based on natural language generation. (Contact: Dr Richard Power)

People and Publications

NLG Group Publications. To see publications for an individual, click on the link next to their name, below, or go to their personal website.

Staff Members

Except where indicated, all e-mail addresses should end with @open.ac.uk

Dr Richard Power

Dr. Richard Power

home page: official / personal, e-mail: r.power, publications

Dr Paul Piwek

Dr. Paul Piwek

home page: official / personal, e-mail: paul.piwek, publications

Dr Allan Third

Dr. Allan Third

home page: official, e-mail: a.third, publications

Dr Sandra Williams

Dr. Sandra Williams

home page: official / personal, e-mail: s.h.williams, publications

PhD Student Members

Except where indicated, all e-mail addresses should end with @open.ac.uk

Richard Doust

Richard Doust

home page: personal, e-mail: richard.doust at free.fr

Sharon Moyo

Sharon Moyo

e-mail: menziwa at hotmail.com

Tu Anh Nguyen

Tu Anh Nguyen

home page: official, e-mail: t.nguyen

Brian Pluss

Brian Plüss

home page: official / personal, e-mail: b.pluss

Previous Members

Except where indicated, all e-mail addresses should end with @open.ac.uk

Dr Svetlana Stoyanchev

Dr. Svetlana Stoyanchev

home page: official / personal, publications

Eva Banik

Eva Banik

home page: personal

Prof Donia Scott

Prof. Donia Scott

home page: official, e-mail: D.R.Scott at sussex.ac.uk

Dr Clara Mancini

Dr. Clara Mancini

home page: official / personal, e-mail: c.mancini

Christian Pietsch

Christian Pietsch

home page: personal

Dr David Hardcastle

Dr. David Hardcastle

home page: personal

Dr Catalina Hallett

Dr. Catalina Hallett

home page: personal

Gaston Burek

Gaston Burek

home page: personal

Previous Visitors

Pascal Kuyten

Pascal Kuyten

home page: official

Susana Bautista Blasco

Susana Bautista Blasco

home page: official, e-mail: S.B.Blasco

Prof Ingrid Zukerman

Prof. Ingrid Zukerman

home page: official

Dr. Violeta S. T. D. B. Quental

Dr. Violeta S. T. D. B. Quental

home page: official

Natalia Grabar

Natalia Grabar

home page: official

Past Projects

NumGen

NumGen was a one-year scoping study that investigated how to express numerical quantities, especially proportions (fractions, percentages and ratios), for different audiences. We built a corpus of numerical expressions that was used for studies of numerical hedges and precision, and a system for planning deep semantic representations for proportions. We also carried out a number of user evaluation studies.

CLEF + CLEF-Services

CLEF + CLEF-Services are a pair of related 3 year projects funded by the Medical Research Council as part of the e-Science programme. The projects ran from October 2002 until the end of 2007. CLEF aimed to create a scalable, generic architecture for capture and management of clinical and other descriptive data, integrated with genomic data and images and linked to literature and web resources. The project consortium was led by the University of Manchester Medical Informatics Group and brought together teams from UCL, University of Sheffield, Royal Marsden Hospital and University of Cambridge. At the OU, we focussed on providing natural language generation technologies for assisting the creation and management of electronic patient records and for generating summaries of clinical data.

Semantic Mining

Semantic Interoperability and Data Mining in Biomedicine was a Network of Excellence (NoE) funded by the European Commission under Framework 6. The general objective of the network was to bridge gaps in European research infrastructure and to facilitate cross-fertilisation between scientific disciplines such as computer science, system engineering and medical/clinical research. The long-term goal of the network was the development of generic methods and tools supporting critical tasks in medical and biomedical informatics, such as, data-mining, knowledge discovery, knowledge representation, abstraction and indexing of information, semantic-based information retrieval in a complex and high-dimensional information space, and knowledge based adaptive systems for provision of decision support for dissemination of evidence based medicine. There were 26 work packages in this project. We were involved in a number of them, including the mobility programmes (WP6), the workshop/tutorial on Natural Language Processing in the Biomedical domain (WP13), the workshop/tutorial on Text Mining and Information Retrieval (WP15), the research activity on multilingual medical dictionary (WP20) and the research activity on Ontology Engineering (WP21). See also: Mobility Programmes Homepage at OU.

HALO

HALO aimed towards the development of a "Digital Aristotle": a computer-based knowledge source for students and scientists, allowing scientific experts to formulate textbook knowledge in a knowledge base which students can consult by posing queries at the level of the American AP exams. The project was funded by Vulcan Inc. (Seattle), owned by Microsoft co-founder Paul G. Allen. Two teams were led by SRI International and the German company Ontoprise. The OU belonged to the "DarkMatter" team, led by Ontoprise, which also included Carnegie-Mellon University, DFKI, Georgia Institute of Technology, and Intelligent Software Components S.A. (ISOCO). OU's role in the project was to contribute to the Question Formation component, using a natural-language interface through which students posed queries to the DarkMatter system.

TUNA

TUNA was a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence, for example, when the Knowledge Base uses 'names' such as '#Jones083' that a hearer/reader does not understand, or has concepts which do not have their own names. (e.g., a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. Existing algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (e.g., `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm generates appropriate descriptions in a far greater variety of situations. The algorithm will be more complete, and generate expressions that are more appropriate because it will be based on empirical studies involving corpora and controlled experiments. Thus the project combines (psycho)linguistic, computational and logical challenges.

RAGS

RAGS: A Reference Architecture for Generation Systems. Although the general problem of Natural Language Generation (NLG) remains far from completely understood, the field is starting to produce systems which achieve successful practical deployment. However, a significant barrier to wider exploitation is the lack of a standard view of the generation process as a whole, within which more specialised research can be embedded and against which whole systems can be compared. This project aimed to provide such a view: a `reference architecture' for natural language generation systems based on an emerging consensus on what such systems should be like. The work aimed to refine this consensus into a more explicit reference architecture specification, identifying principal components and data representations. It has produced example data interfaces and sample implementations of processing modules. Although the resulting architecture may not be a perfect fit for all NLG applications, the intention is that it will be sufficiently general to facilitate sharing of resources and comparative evaluation of different approaches.

WYSIWYM

WYSIWYM: What You See Is What You Meant aimed to allow domain experts to encode their knowledge directly, by interacting with a feedback text, generated by the system, which presents the knowledge defined so far and the options for extending or revising it. Previous knowledge editors have provided a graphical interface so that users can interact with diagrams rather than writing code; WYSIWYM takes a step further, exploiting automatic text generation so that the user interacts with an ordinary natural language document rather than a relatively unfamiliar diagram. In some publications we have referred to this method as Conceptual Authoring.

PILLS

PILLS: Patient Information Language Localisation System The objective of the PILLS project was to facilitate the development of digital content for the European medical and pharmaceutical products industries through a multilingual authoring application designed to support various sectors, including pharmaceutical developers and manufacturers, health portal publishers, and healthcare eMarketplaces. The project used the Patient Information Leaflet Corpus which is available here.

ICONOCLAST

ICONOCLAST: Integrating Constraints on Layout and Style. In the ICONOCLAST project, we developed a framework for integrating constraints on the style and layout of the output documents in an NLG system. By interacting with the system, authors are able to determine the optimal set of constraints whose interaction will lead to the production of documents in the desired style and layout.

DRAFTER

Drafter: A Drafting Assistant for Technical Writers is an interactive tool designed to assist technical authors in the production of English and French end-user manuals for software systems. Unlike current generation systems, which aim at the automated production of instructions and thus keep the authors out of the loop, DRAFTER is a support tool intended to be integrated in the technical author's working environment, hopefully automating some of the more tedious aspects of the authors' tasks.

GNOME

GNOME: A Drafting Assistant for Technical Writers. In the GNOME project, we developed general algorithms which were informed by corpus analysis and results of psycholinguistic studies on how people produce and understand nominal expressions. The resulting algorithms were implemented in the ICONOCLAST project and HCRC's ILEX system.

AGILE

AGILE built a tool which allows a technical author to specify, in a non-linguistic representation, the 'content' of different tasks that can be performed by users of CAD-CAM software. The AGILE system can then automatically express these content specifications in styles appropriate to different sections of a CAD-CAM manual (procedures, ready reference ...) in Bulgarian, Czech and Russian. The generated texts are displayed in a browser as hyperlinked documents. No expertise in knowledge representation is required, although some training with the interface is needed.

Last updated 26th July 2010