Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

Semantic Type - Word Sense Disambiguation

STI (Semantic Type Indexing) uses JDI methodology as the basis to calculate the average ST scores from the Word-St table. STI can be used for word sense disambiguation by selecting the best semantic type. STWSD is a tool developed for this purpose.

I. Input

  • in text
    The sentence or paragraph with ambiguous word(s)
  • Ambiguous word
    The target ambiguous word
  • St candidates
    Possible Semantic Type in the abbreviation form

II. Output

  • The selected Semantic Type in the abbreviation form

III. Algorithm

  • Find variants of ambiguous word
    • Use Lexical Tools fruitful variants flow
    • Remove fruitful variants have punctuation (no punctuation for words in STI)
    • Unify and sort fruitful variants
  • Check WSD inputs
    • Check if input text is empty
    • Check if ST Candidates are legal STs
  • Find forced legal words
    • Tokenize all variants into words
    • Unify and sort
  • Find the ST with highest score
    • Use default input filter (or tokenize word)
    • Get STI scores (DC & WC)
    • Use combined score system
    • Find the ST with highest score