Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

Semantic Types Indexing (Real-Time)

STRI (Semantic Type Real-Time Indexing) uses statistical associations between the words in a training set of MEDLINE citations and a small set of 135 categories in the Semantic Network in NLM's UMLS Metathesaurus. Similar to STI, this method use ST-documents are created comprised of UMLS Metathesaurus string belonging to the ST to calculate ST score on real-time base (instead of pre-calculate). The procedures are briefly described as follows:

  • Calculate the JDI scores (Words|JD|Wc|Dc) on the input Text or MeSH,
  • Read in St-Jd scores (ST|JD|Wc|Dc) from file
  • Calculate the vector similarity between JDI score of Input (Words|JD|Wc|Dc) and St-Jd score (ST|JD|Wc|Dc) by cosine coefficients and get St scores (Words|ST|Wc|Dc).
  • Sort and display the results
This STRI program along with MEDLINE Tokenizer are used to indexing MEDLINE records on:
  • Text: phrase, titles, abstracts, combination of titles and abstracts
  • MeSHs: Starred MeSH headings and Subheadings

I. Java Software Components:

II. Programs: