Semantic Mining Research Student Poster Session 2004

	Research Student Poster Session at the Semantic Interoperability and Data Mining in Biomedicine Summer school and Workshop
Location	Hotel Füred, Balatonfüred, Hungary
Organization	WP6 Mobility Programmes (Paul Piwek at ITRI, the University of Brighton) with WP7 Summer school 2004 (MEDINFO) (EU 6FW Program IST 507505).
Practicalities	Poster presenters will be provided with a poster board with width 120cm and heigth 160cm; information on the exact location of the poster boards will be provided upon arrival at the summerschool. Posters will be on display for the full duration of the summerschool. Additionally, there is a poster session on the evening of July 7 (from 18:00). At this session, poster presenters are requested to stand next to their poster and provide information on their work to the other attendants of the summerschool. All participants of the summerschool are invited to come to the poster session and use this as an opportunity to get to know the contributors of the posters in person and ask them for further information.
Presenters and titles	Albert Gatt (ITRI, University of Brighton), Empirical issues in the Generation of Referring Expressions Marina Santini (ITRI, University of Brighton), Identification of Genres on the Web Albrecht Zaiss (Department of Medical Informatics, University Hospital Freiburg), Gesa Weske-Heck, Susanne Hanser, Stefan Schulz, Wolfgang Giere, Michael Schopen, Rüdiger Klar, The German Specialist Lexicon Mikael Nyström (Medical Informatics, Department of Biomedical Engineering, Linköping University), Terminology Systems and Electronic Health Records Kornel Marko (Freiburg University, Computational Linguistics), Cross-language Indexing and Retrieval with MorphoSaurus Rahil Qamar (University of Manchester), Validating the Health Content Level of Archetype Models using OWL and XML Gábor Balázs and András Jókuthy (University of Veszprem, Dept. of Information Systems), Datamining in an osteoporosis database Christel Le Bozec (ERM 202, INSERM PARIS), Similarity measures for consensus in pathology Marijke Keet (Laboratory of Applied Ontology, Trento and Napier University, School of Computing), On ontology integration for applied bioscience Yin Ling (ITRI, University of Brighton), Question analysis and answering procedural questions Frederic Ehrler (University of Geneva), Passage Retrieval and Data-Poor Text Categorization with the Gene Ontology Amir Reza Razavi (Linköping University, Dept. of Biomedical Engineering), From Data to Knowledge in the Oncology Domain Audrey Baneyx (INSERM ERM 202 - Laboratoire SPIM, Paris), Building Medical Ontologies by Terminology Extraction from Texts: Pneumology, a sample methodology Susanne Hanser (Department of Medical Informatics, University Hospital Freiburg), A feasibility study for a German Procedure Coding System - How to build a well structured procedure classification from an insufficient coding system Sebastian Brandt (TU Dresden), Title tba Jozsef Barcza (Dept. of Information Systems at the Univ. of Veszprem), Title tba Jason Teeple (ITRI, University of Brighton), Title tba Thapelo Otlogetswe (ITRI, University of Brighton), Title tba Rong Chen and Tore Fjaertoft (Karolinska Institute), Title tba
Abstracts	C. Maria Keet, Ontology integration for applied bioscience Keywords: ontology integration, context dependency, biology, applied bioscience Ontology development in the subject domain of applied bioscience, such as biomedicine or food science, blend combinations of concepts of various core sciences, which are only valid for particular contexts within the chosen domain or sections thereof, which may imply reuse and integration of segments of various existing ontologies. This is illustrated with an example from the perspective of bottom-up ontology development for bacteriocin-related knowledge. It is important first to determine what kind of integration is required for which type of goal, therefore a literature research was carried out resulting in a construction of a preliminary categorization of types of ontology integration together with a list of factors and properties that contribute to distinguishing the multiple methods of ‘integration’. Some challenges are highlighted such as the potential for development of an ontology library that might facilitate reuse/integration, how to address context-dependent data where the same basic concepts reappear but with other constraints and options for modularization to abstract away details that may not be relevant in the particular situation to be ontologised. Albert Gatt, Empirical issues in the Generation of Referring Expressions Research on the generation of referring expressions (GRE) has tended to view reference as a semantic problem, seeking to generate descriptions that satisfy the uniqueness criterion. The almost exclusive focus on this criterion has led to impoverished models that neglect cognitive criteria of success, namely: Constraints on complexity and ease of comprehension of a description; Linguistic and perceptual constraints on combinations of attributes; Domain structuring during the search for identifying properties of entities. This poster outlines some of the challenges to GRE arising from psycholinguistic and linguistic studies of reference in communicative settings, focusing in particular on the generation of descriptions containing Boolean operators (logical disjunction/linguistic conjunction). It details current corpus-based work showing that linguistic conjunction is constrained by constraints on semantic similarity and ‘coherent information packaging’ and outlines the prospects for future work in this direction. Amir Reza Razavi, From Data to Knowledge in the Oncology Domain Keywords: decision support system, oncology, data mining, computer-interpretable guidelines Decision making in medicine is based on expert knowledge. Transforming this knowledge into algorithms and computer interpretable guidelines is increasingly being used to support decision-making. However, data stored in large medical databases can be processed and used as a source for promoting best medical practice. This research project focuses on studies with the goal to develop a decision support system in the oncology domain. When oncologists encounter new patients, this system provides them with a combination of information about similar previous cases (past experience) and advices from guidelines (expert knowledge). Past experience is extracted by analyzing medical registers using a data mining technique. Expert knowledge is applied by using CIGs (computer interpretable guidelines). Mikael Nystrom, Terminology Systems and Electronic Health Records Keywords: semantic electronic health record system, medical terminology systems, SNOMED CT, standards, ENV 13606 Our research covers design, implementation, information representation and use of terminology based semantic electronic health record systems in Swedish health care. One ongoing study is to use the ENV 13606 and SNOMED CT as a base for an information model and a description model for an electronic health record system. Another ongoing study is to semiautomatic translate SNOMED CT from English to Swedish based on the already translated medical terminology systems ICD-10, ICD-10-P, ICF, NCSP and MeSH. Future work include study of the relation between terms used in the health record, national health care registries and terminology systems, and information quality of coded patient information. Furthermore, design, implementation and evaluation of terminology services as the basis for structured data entry in the electronic health record. Christel le Bozec, Similarity measures for consensus in pathology Keywords: multimedia knowledge representation; ontology; semantic similarity; similarity measure; computed-assisted consensus; breast pathology Computed-assisted consensus in medical imaging implies automatic comparison of unambiguous morphological abnormalities. In the field of breast pathology, we modelled virtual slide description and inter-observer consensus to build an ontology of morphological abnormalities. We implemented position-based, content-based and mixed similarity measures between concepts and evaluated their results against experts' judgment. Morphological abnormalities extracted from published grading systems, medical reports and existing terminologies were organized in a is-a hierarchy and linked by the relation "is a diagnostic criterion of" according to their diagnostic meaning. The position-based similarity measure using both taxonomic and non-taxonomic relations performed as well as the other measures. Marina Santini, Identification of Genres on the Web Keywords: genre, cybergenres, web genres, text types, text typology, facets, web documents, web pages. Texts about the same topic can belong to different genres, and provide different kinds of information, for example a promotional article on MS Office provides different information from the user's manual of the same product. Genre identification could be extremely useful on the Web and an immediate benefit could be the reduction of information ovearload. But genres on the Web are special types of genres, with special characteristics and problems. In order to account for these special characteristics and cope with the specific problems of Web documents, we propose the use of rhetorical text types for Web genre classification, and suggest a theoretical model based on facets. In our model, text types can be seen as coarse genre classes. Each text type is a combination of facets; facets are macro-features hosting computationally tractable surface cues, i.e. measurable features extracted from Web documents. Audrey Baneyx, Building Medical Ontologies by Terminology Extraction from Texts: Pneumology, a sample methodology Keywords: Ontology building from corpora, Differential ontology, Methodology in four steps. The development of terminological and ontological resources from corpora is necessary for the expected interoperability of health data systems whitin the Semantic Web. The objective of the present work is twofold: First, we suggest and describe a methodology for building ontologies in four steps. The approach is based on the differential semantics principles and allows us to go from textual corpus to computerized ontologies while keeping maximum independance from the syntax and semantics of the field terms. Second, we develop an ontology in the pneumology medical domain. This methodology allows us to save the causality links that lead from medical hypothesis to diagnosis so physicians have at their disposal a trusted measure to the diagnosis. Natural language-processing tools are used to build this sturdy ontology. Gábor Balázs and András Jókuthy, Data mining in an osteoporosis database Keywords: Data mining, non-profit, web-portal The objective of the project is the creation of a Competence Center for data analysis intended to provide a non-profit, intelligent data mining service for the Western region of Hungary consisting of several counties. This service will be made available via the Internet to the researchers active in a wide field of the science and offers a processing capability to analyze scientific observation data. The research and development work has archaeological, seismological and medical applications. The most important application is the last one, which is based on an existing medical database containing data about osteoporosis cases. Kornel Marko, Cross-Language Document Indexing and Retrieval with MorphoSaurus Keywords: Cross Language Information Retrieval (CLIR), Morphological Processing, Thesauri We introduce an interlingua-based approach to cross-language information retrieval, in which queries, as well as documents, are mapped onto a language-independent concept layer on which retrieval operations are performed. This approach is contrasted with one which directly translates non-English queries (German and Portuguese, in our experiments) to English ones which, subsequently, are processed on English documents. We report on the empirical evaluation of both approaches on a large medical document collection (the Ohsumed corpus). Yin Ling, Question analysis and answering procedural questions Keywords: Information Retrieval, Question Answering, Procedural Question In this poster, we present an analysis of question structure from a special angle. We divide a question into a generic part and a specific part, and further claim that these two parts should be treated differently in retrieving and extracting answers. The specific part often specifies the context of discourse and will appear again in the answer, while the generic part, which defines a class of relevant but unknown information, will be replaced by detailed facts in the answer. Therefore, information relevant to the generic part will be more difficult to locate than information relevant to the specific part. As examples, we analyze some procedural questions and their answers, and introduce some preliminary methods in answer extraction for procedural questions. Frederick Ehrler, Passage Retrieval and Data-Poor Text Categorization with the Gene Ontology Keywords: Passage Retrieval, Text Categorization , Gene Ontology The poster displays information about experiments performed on a passage retrieval and a categorization task. Both tasks involve the Gene Ontology category. The passage retrieval part consists in finding the relevant passage in a text given its annotation. The categorization part consists in finding the relevant categorization term given a document. The poster is divided in four different parts: the data which describes some statistics about the Gene Ontology, the methods which have been used to resolve both tasks, an overview of the results and finally the future developments which have to be done. Rahil Qamar, Validating the Health Content Level of Archetype Models using OWL and XML Keywords: Ontology, Archetype Model, EHR, OWL, ADL, Health Information Systems. The emergence of I.T. within the Health Sector has provided a new dimension to the process of patient care. One of the more recent approaches has been to integrate various knowledge and data models to ensure that valid data is recorded within the Electronic Health Record (EHR) systems. Two such knowledge methodologies being presently used are Ontologies and Archetype Models (AMs), which if brought to work together, can bring significant benefits to the way EHRs are created and queried across various health domains. My research involves the creation of Ontologies to model specific health concepts, which will then be used to validate the Content Level of AMs to ensure the construction of structured and valid datasets within EHRs, ultimately improving the quality of patient-care provided within health institutions. Susanne Hanser, A feasibility study for a German Procedure Coding System - How to build a well structured procedure classification from an insufficient coding system Keywords: Coding System, Procedure Classification, OPS-301, ICD-10-PCS, CCAM Aiming at the substitution of the current German classification for procedures in medicine (OPS-301) by a well structured classification with good expandability and good capability for statistical purposes a feasibility study should evaluate the possibility of developing an optimized German procedure coding system by representing the semantics of OPS-301 codes using the classification structure of ICD 10 PCS and/or CCAM. Quantification how often and in which granularity it was possible to represent the contents of about 600 OPS-301 codes showed better results for CCAM. Together with qualitative findings (e.g. the use of medical language, avoidance of multiple coding) this led to the recommendation to develop a new coding system based on the CCAM architecture. The creation of a pilot version MPS (Medizinischer Prozedurenschlüssel) proved the anticipated advantages: The content representation of OPS-301 using the methodology of CCAM results in a semantically more precise and user-friendly procedure classification. The information actually used remains usable and the implementation of a new coding system is also facilitated by automatically generating a 1:1 mapping table between OPS-301 and MPS.
	Last Modified on July 1 2004 by