About Us | Meetings | Projects | People and Publications | Past Projects
The Open University
The Natural Language Generation group at the Open University's Centre for Research in Computing is a team of computer scientists and computational linguists working together to develop theories and technologies to support the automatic generation of natural language. Broadly speaking, our work focuses on what we call “Flexible Information Presentation”: generating presentations that are appropriate to the context in which they occur. In particular, we are interested in achieving flexibility with respect to:
Our work is supported by funding from national research councils, European research agencies, and industry.
The aim of the CODA project (start date: July 1, 2009) is to develop the theory and technology for automatic transformation of text in monologue form to information-delivering dialogue, specifically dialogue between a 'layman' (e.g., patient or student) and an 'expert' (e.g., doctor or tutor).
The DataMIX project aims to explore more inclusive means of communicating data.
SWAT (Semantic Web Authoring Tool),
a joint project with the School of Computer
Science at the University of Manchester, is funded by the EPSRC. Its aim is to develop an NLG-based editing tool for the Semantic Web. An
obstacle to realising the goals of the Semantic Web is that it
depends technically on standardised formalisms for representing
ontologies and factual data (OWL, RDF) that are inaccessible to most users. SWAT
will establish principles for mapping between these formalisms and natural languages
(e.g., English), so allowing ontologies and data to be defined and viewed through a transparent
medium: interactive texts, generated automatically from the underlying encodings in OWL
The work is supported by an advisory board of researchers currently working on Semantic Web applications, with representatives from the World Health Organisation, Siemens, Ordnance Survey, the National Health Service, the Stanford Center for Biomedical Informatics Research, and the MyGrid project. Demonstrator applications will be developed in the domains of data workflow management, medical orders, and travel. The research draws on Manchester's expertise in the theory and application of ontologies, and the OU's experience in the development of knowledge-editing tools based on natural language generation.
NLG Group Publications. To see publications for an individual, click on the link next to their name, below, or go to their personal website.
( Except where indicated, all e-mail addresses should end with @open.ac.uk )
( Except where indicated, all e-mail addresses should end with @open.ac.uk )
Richard Doust, home page:personal e-mail: richard.doust at free.fr
Sharon Moyo, e-mail: menziwa at hotmail.com
Tu Anh Nguyen, home page: official, e-mail: t.nguyen
Eva Banik, home page: personal
Prof. Donia Scott, home page: official / -- e-mail: D.R.Scott at sussex.ac.uk
Christian Pietsch, home page: personal
Dr. David Hardcastle, home page: personal
Dr. Catalina Hallett, home page: personal
Gaston Burek, home page: personal
Pascal Kuyten, home page: official
Susana Bautista Blasco, home page: official / personal, e-mail: S.B.Blasco
Prof. Ingrid Zukerman, home page: Monash University, Australia,
Dr. Violeta S. T. D. B. Quental, home page: Pontifícia Universidade Católica do Rio de Janeiro, Brazil
Natalia Grabar, home page: SPIM/U729, Inserm, Paris
NumGen NumGen was a one-year scoping study that investigated how to express numerical quantities, especially proportions (fractions, percentages and ratios), for different audiences. We built a corpus of numerical expressions that was used for studies of numerical hedges and precision, and a system for planning deep semantic representations for proportions. We also carried out a number of user evaluation studies.
CLEF + CLEF-Services are a pair of related 3 year projects funded by the Medical Research Council as part of the e-Science programme. The projects ran from October 2002 until the end of 2007. CLEF aimed to create a scalable, generic architecture for capture and management of clinical and other descriptive data, integrated with genomic data and images and linked to literature and web resources. The project consortium was led by the University of Manchester Medical Informatics Group and brought together teams from UCL, University of Sheffield, Royal Marsden Hospital and University of Cambridge. At the OU, we focussed on providing natural language generation technologies for assisting the creation and management of electronic patient records and for generating summaries of clinical data.
Semantic Interoperability and Data Mining in Biomedicine was a Network of Excellence (NoE) funded by the European Commission under Framework 6. The general objective of the network was to bridge gaps in European research infrastructure and to facilitate cross-fertilisation between scientific disciplines such as computer science, system engineering and medical/clinical research. The long-term goal of the network was the development of generic methods and tools supporting critical tasks in medical and biomedical informatics, such as, data-mining, knowledge discovery, knowledge representation, abstraction and indexing of information, semantic-based information retrieval in a complex and high-dimensional information space, and knowledge based adaptive systems for provision of decision support for dissemination of evidence based medicine. There were 26 work packages in this project. We were involved in a number of them, including the mobility programmes (WP6), the workshop/tutorial on Natural Language Processing in the Biomedical domain (WP13), the workshop/tutorial on Text Mining and Information Retrieval (WP15), the research activity on multilingual medical dictionary (WP20) and the research activity on Ontology Engineering (WP21). See also: Mobility Programmes Homepage at OU.
HALO aimed towards the development of a "Digital Aristotle": a computer-based knowledge source for students and scientists, allowing scientific experts to formulate textbook knowledge in a knowledge base which students can consult by posing queries at the level of the American AP exams. The project was funded by Vulcan Inc. (Seattle), owned by Microsoft co-founder Paul G. Allen. Two teams were led by SRI International and the German company Ontoprise. The OU belonged to the "DarkMatter" team, led by Ontoprise, which also included Carnegie-Mellon University, DFKI, Georgia Institute of Technology, and Intelligent Software Components S.A. (ISOCO). OU's role in the project was to contribute to the Question Formation component, using a natural-language interface through which students posed queries to the DarkMatter system.
TUNA was a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence, for example, when the Knowledge Base uses 'names' such as '#Jones083' that a hearer/reader does not understand, or has concepts which do not have their own names. (e.g., a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. Existing algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (e.g., `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm generates appropriate descriptions in a far greater variety of situations. The algorithm will be more complete, and generate expressions that are more appropriate because it will be based on empirical studies involving corpora and controlled experiments. Thus the project combines (psycho)linguistic, computational and logical challenges.
RAGS: A Reference Architecture for Generation Systems. Although the general problem of Natural Language Generation (NLG) remains far from completely understood, the field is starting to produce systems which achieve successful practical deployment. However, a significant barrier to wider exploitation is the lack of a standard view of the generation process as a whole, within which more specialised research can be embedded and against which whole systems can be compared. This project aimed to provide such a view: a `reference architecture' for natural language generation systems based on an emerging consensus on what such systems should be like. The work aimed to refine this consensus into a more explicit reference architecture specification, identifying principal components and data representations. It has produced example data interfaces and sample implementations of processing modules. Although the resulting architecture may not be a perfect fit for all NLG applications, the intention is that it will be sufficiently general to facilitate sharing of resources and comparative evaluation of different approaches.
WYSIWYM: What You See Is What You Meant aimed to allow domain experts to encode their knowledge directly, by interacting with a feedback text, generated by the system, which presents the knowledge defined so far and the options for extending or revising it. Previous knowledge editors have provided a graphical interface so that users can interact with diagrams rather than writing code; WYSIWYM takes a step further, exploiting automatic text generation so that the user interacts with an ordinary natural language document rather than a relatively unfamiliar diagram. In some publications we have referred to this method as Conceptual Authoring.
PILLS: Patient Information Language Localisation System The objective of the PILLS project was to facilitate the development of digital content for the European medical and pharmaceutical products industries through a multilingual authoring application designed to support various sectors, including pharmaceutical developers and manufacturers, health portal publishers, and healthcare eMarketplaces. The project used the Patient Information Leaflet Corpus which is available here.
ICONOCLAST: Integrating Constraints on Layout and Style. In the ICONOCLAST project, we developed a framework for integrating constraints on the style and layout of the output documents in an NLG system. By interacting with the system, authors are able to determine the optimal set of constraints whose interaction will lead to the production of documents in the desired style and layout.
Drafter: A Drafting Assistant for Technical Writers is an interactive tool designed to assist technical authors in the production of English and French end-user manuals for software systems. Unlike current generation systems, which aim at the automated production of instructions and thus keep the authors out of the loop, DRAFTER is a support tool intended to be integrated in the technical author's working environment, hopefully automating some of the more tedious aspects of the authors' tasks.
GNOME: A Drafting Assistant for Technical Writers. In the GNOME project, we developed general algorithms which were informed by corpus analysis and results of psycholinguistic studies on how people produce and understand nominal expressions. The resulting algorithms were implemented in the ICONOCLAST project and HCRC's ILEX system.
AGILE built a tool which allows a technical author to specify, in a non-linguistic representation, the 'content' of different tasks that can be performed by users of CAD-CAM software. The AGILE system can then automatically express these content specifications in styles appropriate to different sections of a CAD-CAM manual (procedures, ready reference ...) in Bulgarian, Czech and Russian. The generated texts are displayed in a browser as hyperlinked documents. No expertise in knowledge representation is required, although some training with the interface is needed.
Last updated 26 July 2010