You are here
Discovering Missed Synonymy in a Large Concept-Orientated Methathesaurus
The Unified Medical Language System (UMLS) [1, 2] Metathesuarus is concept-oriented; its goal is to unite all names with identical meaning in a single Concept. The names come from its constituent vocabularies or "sources" - a wide variety of biomedical terminologies including many controlled vocabularies and classifications used in patient records, administrative health data, bibliographic, research, full-text, and expert systems. Many offer little definitional information, and many are not themselves concept-oriented, so identifying synonymy is a challenging semantic task . The rapidly increasing size of the Metathesaurus makes the task daunting, demanding effective computational support; there are more than 1.5 million names for 730,000 concepts in the January 2000 release. Vocabularies are added and updated using sophisticated lexical matching, selective algorithms, and expert review [4, 5, 6]. Yet the result is imperfect; we have discovered and corrected missed synonymy in approximately 1% of previously released concepts each year. This paper reviews general methods for finding missed synonymy and describes several specific novel approaches which we have found effective.