CSpell

Consumer Data (From Dina)

I. Introduction

The page describes consumer data that are used in baseline dictionary. There are four files in this data set:

  • umls_anatomy_merged.txt
  • umls_interventions_merged.txt
  • umls_population_merged.txt
  • umls_problem_merged.txt

II. Algorithm

The above 4 files are generated from UMLS (2013AB?) by the following steps:

  • Retrieve English strings from UMLS, filtered by semantic types
    • St list (abb): selected Semantic Types in abbreviation
    • SRDEF: converts ST abb to TUI
    • MRSTY.RRF: CUI|TUI, use as filter
    • MRCONSO.RRF: Terms|CUI, used to retrieve terms
  • Lower case
  • Add some terms from Gopher, problem list, Susan's data, etc.

III. Analysis

File NameSemantic TypesTermsNot UMLS (No CUI)
umls_anatomy_merged.txt9295,9320
umls_interventions_merged.txt65528,668expo: 5,457
umls_population_merged.txt45,8980
umls_problem_merged.txt68644,839prob: 1,643, (from Gopher Terms)
Total Terms1471,475,204all.txt.1
Total Unique Terms971,469,339all.txt.1.uSort
Total TokensN/A299,669medDic.data

IV. Others

  • Program: ${PRE_PROCESS}/bin/RunPreProc
  • Data: ${PRE_PROCESS}/data/Baseline/inData
  • Data: ${PRE_PROCESS}/data/Baseline/outData

  • If the data is generated from 2013AA UMLS, there are three ST (abb) are not in the SRDEF, 2013AA (they are actually exist before 2009AB):

    ST abbSource File (term no)
    alga
    • umls_problem_merged.txt (1)
    invt
    • umls_interventions_merged.txt (1)
    • umls_problem_merged.txt (33)
    rich
    • umls_problem_merged.txt (3)

V. Other Resources

Other resources are used to merge to the above 4 files: