Current interests

Semantic Web Authoring

My main interest at present is to find effective methods for opening up the semantic web to a wide range of users, through tools that allow domain experts to define (and view) ontologies and datasets using familiar modes of presentation -- in particular, controlled natural languages.

Our intention in the SWAT project is to pursue this objective by applying various technologies from Natural Language Generation. These include WYSIWYM knowledge authoring, and the generation of large organised documents (or hypertexts) to present complex ontologies/data.

Assigning rhetorical relations

Arising from work on ontology verbalisation in SWAT, I have carried out a theoretical investigation (and a related empirical study) on assigning a rhetorical relationship (including 'unrelated') to any pair of statements from an ontology. Some papers and further materials are given here.

Generating numerical expressions

I have made some contributions to Sandra Williams's NUMGEN project, which investigates the generation of appropriate numerical expressions describing proportions. In particular, I am interested in approximations like a half or about 25%. A number of authors have considered why such approximations are used (e.g., with reference to Grice's maxims on quantity and quality); I am more interested in how appropriate approximations are planned (e.g., in the choice of a suitable round number).

Abstract verbs

We learn at school that verbs describe events or actions. In fact many verbs do nothing of the sort: their purpose is to describe abstract relationships. Thus verbs like induce, suggest can denote discourse relations between events or propositions (e.g., relations like CAUSE or EVIDENCE); or verbs like perform, undergo can denote thematic relations between an event and one of its participants (e.g., AGENT, PATIENT); or verbs like doubled, fell can denote arithmetical relations between numbers. I believe that studying these verbs can provide interesting insights into the relations themselves as well as widening the scope of Natural Language Generation.

Previous projects

1993-96: GIST, DRAFTER

The ITRI participated in two multilingual generation projects with a similar architecture. In GIST the domain was administrative forms, and the languages were English, Italian, German; in DRAFTER the domain was software manuals, and the languages were English and French. A challenge in both projects was to find a way in which domain experts could author the input to the generator without expertise in a knowledge representation formalism. My work on this problem led to the WYSIWYM invention.

1996-2005: WYSIWYM

WYSIWYM (What You See Is What You Meant) is based on the paradoxical idea that an NLG system can be deployed to obtain its own input. The user authors content in an underlying knowledge formalism by interacting with a generated text which expresses (a) the knowledge so far defined (b) the options for extending or revising this knowledge. Options are derived from a fixed ontology, and presented in natural language through menus that pop up on the text. WYSIWYM has been applied in several of our projects (CLIME, PILLS, CLEF, CLEF-Services, HALO) and also in projects by other groups.

1997-2000: ICONOCLAST

ICONOCLAST (Integrating CONstraints On Content, Layout And STyle) applied constraint satisfaction techniques (specifically, constraint logic programming) in order to explore large solution spaces in NLG and thus allow variations in style. I collaborated on this project with Donia Scott and Nadjet Bouayad-Agha, and among other things it led to the concept of document structure expounded in our 2003 paper.

2000-2001: PILLS

The European Commission funded PILLS (Patient Information Language Localisation System) as a feasibility study on applying WYSIWYM technology to the production of multilingual technical documentation in the pharmaceutical sector. We collaborated with Berlitz, and the University of Freiburg, Germany. The system generated three types of pharmaceutical document (intended for patients, doctors, and regulators) in three languages (English, French, German).

2002-2005: CLEF

Funded by the MRC, CLEF (CLinical E-science Framework) developed methods for representing electronic patient records in machine-usable form, and exploiting them in order to answer queries (for the benefit of medical researchers) and generate summaries (for the benefit of clinicians). With Catalina Hallett and Donia Scott I contributed to two parts of the project: editing of queries, using a WYSIWYM interface; and generation of summaries. The work was continued in a second project CLEF-Services (2004-2008).

2003-2007: TUNA

I was a co-investigator (with Kees van Deemter) on TUNA (Towards a UNified Algorithm for the generation of referring expressions), which led eventually to the construction of a semantically aligned corpus employed in a "Generation Challenge".

2004-2006: HALO-2

Funded by industry, HALO-2 ran several teams in an effort to encode knowledge from scientific textbooks in a form that could be queried by students. With Ondrej Pacovsky I contributed to a WYSIWYM interface allowing queries to be formulated in F-Logic, the formalism developed by our team leaders Ontoprise.