This directory contains version 2.0 of the Patient Information Leaflet corpus,
a collection of several hundred documents giving instructions to patients
about their medication. The corpus was originally created from the ABPI
compendium of patient information leaflets by manual scanning and conversion.
Documents are available in rtf, doc, html formats, and also marked up with
logical structure using a specially created sgml dtd
specification.
The corpus is organised in the following versions:
The base corpus consisting of all 595 documents
originally processed;
The PIL corpus consisting of a subset of 471 documents
after removal of non-PIL documents and near duplicates. (This is the corpus
recommended for general use.)
The PIL corpus was initially developed as part of the ICONOCLAST
project, supported by the EPSRC (grant no L77102).
Release notes
March 2006
Version 2.0
Tidied up for general release by Roger Evans
Nov 2000
Version 1.0
Initial internal release by Nadjet Bouayad-Agha
Projects using this resource
The following is a list of projects we know about that have made use of the
PIL corpus. If you know of other uses not in this list, please send an email
to
.
2000: The ICONOCLAST
project (ITRI, Brighton) originated ths corpus and used it in work on
constraint-based generation in different styles.
2001: The PILLS
project (ITRI, Brighton; IMI Freiburg; Berlitz) used the corpus in the
development of a multilingual authoring tool for patient information leaflets.
2001: The RAGS
project (ITRI, Brighton), used PIL data as the target for its RICHES
demonstration generator.
2000-2004: Daniel
Paiva's PhD thesis work, (ITRI, Brighton), used the
corpus for work on stylistic control of generation.
2004: The COGENT project
(ITRI, Brighton; Informatics, Sussex) is using the corpus in its work on
wide-coverage generation.