Quality Assurance in Biomedical Terminologies and Ontologies.

Bodenreider O

April 2010 Technical Report to the LHNCBC Board of Scientific Counselors.


Biomedical terminologies and ontologies are enabling resources for clinical decision support systems and data integration systems for translational research. Therefore, the quality of these resources has a direct impact on healthcare and biomedical research. In the past few years, quality assurance (QA) of biomedical terminologies and ontologies has become a key issue in the development of standard terminologies, such as SNOMED CT, and has emerged as an active field of research. Approaches to quality assurance include the use of lexical, structural, semantic and statistical techniques applied to particular biomedical terminologies and ontologies, as well as techniques for comparing and contrasting biomedical terminologies and ontologies. In this report, we review 36 studies performed in our research group over the past twelve years having some quality assurance component. About half of these studies have a primary focus on quality assurance in terminologies. In the other half, quality assurance is generally an application of the method. As it is not possible or desirable to report each study in detail, we first present an overview of the 36 studies, using the analytical framework presented in. Then, we selected four studies representative of the range of methods developed and present them in more detail. For the purpose of this report, we use biomedical terminologies and ontologies as a generic term for the various kinds of artifacts available for representing the names, meaning and usage of biomedical entities. Ontologies typically define types of entities and their relations (e.g., the Foundational Model of Anatomy (FMA)); terminologies tend to focus on naming (e.g., the list of official gene names and symbols established by the HUGO Gene Nomenclature Committee); thesauri organize entities for a given purpose (e.g., the Medical Subject Headings - MeSH - created for indexing the biomedical literature); classifications allow users to place entities in non-overlapping classes (e.g., the International Classification of Diseases); and knowledge bases incorporate assertional knowledge (e.g., quinine treats malaria in addition to the definitional knowledge found in ontologies (e.g., pneumonia has location lung. In many cases, however, the distinction among these categories of artifacts is not so sharp. For example, some ontologies also collect names for the entities they represent (e.g., the FMA collects synonyms and names in languages other than English). Conversely, most terminologies are not mere collections of terms, but are organized into hierarchies denoting relations among entities. Finally, the very name of some of these artifacts is misleading. For example, despite its name, the Gene Ontology is mostly a controlled vocabulary for the annotation of gene products. For these reasons, we do not attempt to make a difference between terminologies and ontologies when we refer to the artifacts we analyzed.

Bodenreider O. Quality Assurance in Biomedical Terminologies and Ontologies. 
April 2010 Technical Report to the LHNCBC Board of Scientific Counselors.