CSpell

Dictionary from MEDLINE

I. Introduction

Words from MEDLINE titles and abstracts are used to generate dictionary. They are tested in CSpell.

II. Algorithm

  • MEDLINE N-gram set is used to retrieve
    • Unigram
    • word count >= 30
  • The core term of Unigram from above is used for dictionary
    • lower case
    • combined by core-term

III. Output

  • File name: ${PRE_PROCESS}/data/Medline/${YEAR}/outData/medline.dic
  • Format: lowercase unigrams