You are here

Lexical Systems & Tools

Project information
Researchers: 

LHNCBC's Lexical Systems Group develops and maintains the SPECIALIST lexicon and the tools that support and exploit it. The SPECIALIST Lexicon and NLP Tools are at the center of NLM's natural language research, providing a foundation for all our natural language processing efforts. In general, we investigate the contributions that natural language processing techniques can make to the task of mediating between the language of users and the language of online biomedical information resources. The SPECIALIST NLP Tools facilitate natural language processing by helping application developers with lexical variation and text analysis tasks in the biomedical domain.

Recently, the Lexical Systems Group began a project to enhance the derivational-variants function of the lexical tools. The derivational-variants function uses a set of derivational facts and rules to generate or identify derivational variants of input terms. Derivational variants are words related by a word-formation process like suffixation, prefixation or conversion (change of category). The current derivational variant system has only suffix rules and facts. These rules and facts are hand entered and curated. In order to add suffixation and conversion functionality to the system, the PDM team has developed a method to automatically extract candidate pairs of words that may be derivationally related, which helps automate the creation of rules and facts for suffixation and conversion.

The SPECIALIST Lexicon and Lexical tools are open source and freely downloadable. The 2012 release of the SPECIALIST Lexicon will contain over 462,000 records, representing over 830,000 forms, an increase of over 13,000 records from the 2011 release. Many of the new terms are derived from de-identified clinical records from our own De-identification project and from the MIMIC database.

Publications/Tools: 
Bhupatiraju R, Fung K, Bodenreider O. MetaMapLite in Excel: Biomedical named-entity recognition for non-technical users. Stud Health Technol Inform (Proc Medinfo): 1252.
Lu C, Tormey D, McCreedy L, Browne AC. Generating A Distilled N-Gram Set: Effective Lexical Multiword Building in the SPECIALIST Lexicon . The 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), Vol(5): HEALTHINF, PORTO, Portugal, February 21-23, 2017, p. 77-87.
Kastrin A, Rindflesch TC, Hristovski D. Link prediction on a network of co-occurring MeSH terms: Towards literature-based discovery. Methods of Information in Medicine 55(4):340-6.
Lu C, Browne AC. Development of Sub-Term Mapping Tools (STMT) AMIA 2012 Annual Symposium, Chicago, IL, November 3-7, 2012
Lu C, Browne AC. Development of sub-term mapping tools (STMT) Poster). AMIA 2012 Annual Symposium, Chicago, IL, November 3-7, 2012, p. 1845 (AMIA, 2012 Distinguished Poster Award) Poster.
Lu C, McCreedy L, Tormey D, Browne AC. A Systematic Approach for Automatically Generating Derivational Variants in Lexical Tools Based on theSPECIALIST Lexicon IEEE IT Professional Magazine, May/June, 2012;36-42.
Lu C, Tormey D, McCreedy L, Browne AC. • A Systematic Approach for Automatically Generating Derivational Variants in Lexical Tools Based on the SPECIALIST Lexicon. IEEE IT Professional Magazine, May/June, 2012, p. 36-42.
Lu C, Browne AC. Converting Unicode Lexicon and Lexical Tools for ASII NLP Applications AMIA Annu Symp Proc 2011:1870.
Lu C, Divita G, Browne AC. Development of Visual Tagging Tool AMIA Annu Symp Proc 2010:1156.
Kilicoglu H, Fiszman M, Rosemblat G, Marimpietri S, Rindflesch TC. Arguments of Nominals in Semantic Interpretation of Biomedical Text BioNLM Workshop Proc, Assoc. for Computational Linguistics 2010

Pages