Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

Semantic Type Indexing

STI (Semantic Type Indexing) uses JDI methodology as the basis to calculate the average ST scores from the Word-St table. STs are a set of 135 categories in the Semantic Network in NLM's Unified Medical Language System. Concepts in the UMLS Metathesaurus are assigned one or more STs which form an "isa" link from the concept to the ST. For example, the Metathesaurus concept Aspirin is assigned the STs Pharmacologic Substance and Organic Chemical. The set of UMLS Metathesaurus concepts assigned to an ST can be regarded as an "ST document". In other words, ST-documents are created comprised of UMLS Metathesaurus strings belonging to the ST. Word-St tables are generated by following steps:

  • Generate St-Documents from UMLS Meta-thesaurus (ST Concepts|Words)
  • Use JDI to index ST-Documents (ST|JD|Wc|Dc)
  • Use cosine coefficient on JDI of ST-Documents (ST|JD|Wc|Dc) and JDI on individual training set words (Word|JD|Wc|Dc) to get Word|ST|Wc|Dc.

The STI program along with MEDLINE Tokenizer are used to indexing MEDLINE records on:

  • Text: phrase, titles, abstracts, combination of titles and abstracts

I. Java Software Components:

II. Programs: