CSpell

Consumer Data (From Dina)

I. Introduction

The page describes consumer data that are used in baseline dictionary. There are four files in this data set:

II. Algorithm

The above 4 files are generated from UMLS (2013AB?) by the following steps:

Retrieve English strings from UMLS, filtered by semantic types
- St list (abb): selected Semantic Types in abbreviation
- SRDEF: converts ST abb to TUI
- MRSTY.RRF: CUI|TUI, use as filter
- MRCONSO.RRF: Terms|CUI, used to retrieve terms
Lower case
Add some terms from Gopher, problem list, Susan's data, etc.

III. Analysis

File Name	Semantic Types	Terms	Not UMLS (No CUI)
umls_anatomy_merged.txt	9	295,932	0
umls_interventions_merged.txt	65	528,668	expo: 5,457
umls_population_merged.txt	4	5,898	0
umls_problem_merged.txt	68	644,839	prob: 1,643, (from Gopher Terms)

Total Terms	147	1,475,204	all.txt.1
Total Unique Terms	97	1,469,339	all.txt.1.uSort
Total Tokens	N/A	299,669	medDic.data

IV. Others

ST abb	Source File (term no)
alga	umls_problem_merged.txt (1)
invt	umls_interventions_merged.txt (1) umls_problem_merged.txt (33)
rich	umls_problem_merged.txt (3)

V. Other Resources

Other resources are used to merge to the above 4 files:

File Name	Semantic Types	Terms	Not UMLS (No CUI)
interventions.txt (PICO)	76	30,492	expo: 6,344
umls_problem_list.txt (UMLS)	71	254,420	prob: 1,792