You are here
Converting Unicode Lexicon and Lexical Tools for ASII NLP Applications
The NLP SPECIALIST Lexicon and Lexical Tools, distributed by National Library of Medicine (NLM), have been released in Unicode (UTF-8) format since 2006. Lexicon is used as corpus while Lexical Tools are used as software packages in NLP (Natural Language Processing) projects. Some NLP projects still only deal with ASCII (7-bit) characters. This paper describes how to convert UTF-8 Lexicon and integrate Lexical Tools to a pure ASCII NLP project, MetaMap.