CSpell

Dictionaries from the SPECIALIST Lexicon

I. Introduction

The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English. All lexical items are reviewed and verified by linguists. Different dictionaries are generated from the Lexicon for different needs.

II. Generation

  • Source Code: lexCheck2016/sources/gov/nih/nlm/nls/lexCheck/Api/ToDicVarsApi.java
  • The annual release of Lexicon is used as input
  • Generated in the pre-process
  • Output: lexiconDic.data
  • All lexical records in the Lexicon are converted to DicVar with 7 fields:

    Word POS Inflection Source (EUI)AcrAbb FlagproperNoun FlagspVar Flag
    Case sensitive
    • adj (1)
    • adv (2)
    • aux (4)
    • compl (8)
    • conj (16)
    • det (32)
    • modal (64)
    • noun (128)
    • prep (256)
    • pron (512)
    • verb (1024)
    • base (1)
    • comparative (2)
    • superlative (4)
    • plural (8)
    • presPart (16)
    • past (32)
    • pastPart (64)
    • pres3s (128)
    • positive (256)
    • singular (512)
    • infinitive (1024)
    • pres123p (2048)
    • pastNeg (4096)
    • pres123pNeg (8192)
    • pres1s (16384)
    • past1p23pNeg (32768)
    • past1p23p (65536)
    • past1s3sNeg (131072)
    • pres1p23p (262144)
    • pres1p23pNeg (524288)
    • past1s3s (1048576)
    • pres (2097152)
    • pres3sNeg (4194304)
    • presNeg (8388608)
    EUI
    • true
    • false
    • true
    • false
    • true
    • false

    * Unique flag from inflVar is not used. It is set to false if all properties are the same, but the type of inflectional rules are different.

III. Output

  • Directory: ${PRE_PROCESS}/data/Lexicon/${YEAR}/outData/Dic

    The following dictionaries are generated

    DictionaryDescription
    lexicon.all.dicAll terms, case sensitive
    lexicon.mw.dicmultiwords, case sensitive
    lexicon.sw.dicsingle-words, case sensitive
    lexicon.nw.dicnon words (unigram, only in mw, not in sw)
    lexicon.ew.dicelement words (= unigram = sw + nw), case sensitive
    lexicon.aa.dicabbreviations or acronyms, case sensitive
    lexicon.pn.dicproper nouns, case sensitive
    lexicon.sv.dicspelling variants, case sensitive
    lexicon.noAa.dicen + pn
    lexicon.paa.dicpure aa, (= aa - en)
    lexicon.en.dicEnglish word (= all - pn - aa), case sensitive
    lexicon.swEn.dicEnglish word, also single word only
    lexicon.noAa.dicEnglish word and proper noun (= all -aa), used in check element words in split

IV. Notes

  • Handles possessive ('s) when checking if a word in the dictionary
  • Source code: DictionaryBasedSpellChecker.java