Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

CSpell Dictionary Summary

This page summaries dictionary files used in CSpell.

I. CSpell Dictionary files

TypeSource filesNotes
CS_CHECK_DIC_FILEScheck.dic
  • Generic dictionary for checking valid unigrams
  • Used in NW/RW, Split/1To1 detectors
CS_SUGGEST_DIC_FILESsugg.dic (same as check.dic)
  • Generic dictionary for checking valid unigrams
  • Used in NW/RW, Split/1To1 detectors
CS_SPLIT_WORD_DIC_FILESsplit.dic
  • Used in NW/RW Merge detectors
  • Used in NW/RW Split candidates
CS_MW_DIC_FILElexicon.mw.dic
  • Used in NW/RW Split candidates (check if the split candidate is a multiword)
  • Used in NW/RW Merge candidates (check if the focus token and context tokens are multiword, then no merge)
CS_UNIT_DIC_FILEunit.data
  • Used in NW/RW Merge/Split/1To1 detector to check exceptions
  • Used in RW Split candidates (can't be unit, such as mg)
CS_SV_DIC_FILEsv.dic
  • Spelling variants
  • Not used for now
  • To be used for RW-1To1 Detector
CS_AA_DIC_FILElexicon.aa.dic
  • Abbreviation or Acronym in Lexicon
  • Used in NW/RW Merge candidates (don't merge context if Aa)
  • Used in RW 1-to-1 detector
CS_PN_DIC_FILElexicon.pn.dic
  • Proper noun in Lexicon
  • Used in RW Split candidate (split word can't be pn)
  • Used in RW 1-to-1 detector

II. Source Dictionary Files

FileSourceNotes
Lexicon Release
NRVAR.1.uSort.dataLexiconLexicon Number variants
  • NRVAR
  • field 1
  • uniquely sorted
lexicon.ew.dicLexiconLexicon Element Words
  • Unigram of all Lexical Entries
lexicon.enEwLc.dic.addRmLexiconLexicon English Element Word, Lowercase
English is Lexicon - Aa -Pn

remove:

  • amita
  • anil
  • anser
  • catacholamine
  • diaphram
  • flavanoid
  • flavanoids
  • glucoma
  • losangeles
  • palmita

add:

  • i'm
  • i've
  • medline
  • medlineplus
  • y'all
lexicon.swNoAaLc.dicLexiconLexicon Single Word, Not AA, Lowercase
  • Single Word
  • Not pure AA (aids is in because it is the 3rd-sigular for aid)
  • Lowercase
lexicon.mw.dicLexiconLexicon Multiwords
lexicon.aa.dicLexiconLexicon Abbreviations and Acronyms
lexicon.pn.dicLexiconLexicon Proper Noun
lexicon.sv.dicLexiconLexicon Spelling Variants
Consumer Health Related Data
Med.l.dicMed from UMLS-ST
  • Using Lexicon as English Dictionary
Med.cm-l.dicMed from UMLS-ST
  • cm: Consumer and Medline data
  • -l: exclude Lexicon
Others
unit.dataLexicon PreProcessUnit collection from generate Lexicon multiword process
cistomerDic.dataCSpell PreProcessEmpirical data collection (that are not from Lexicon or Consumer Health Data)