Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
LuiAssignment Analysis
I. Summary
This test/analysis is based on the feedback reports from OCCS UMLS group (Soma Lanka). OCCS UMLS group uses the new release Lexical tool (luiNorm) to assign new LUI on UMLS strings for the new release of UMLS. All strings are assigned to a LUI based on the luiNorm form. OCCS UMLS group runs a program to compare the difference on LUI assignment between new release and previous release and results in three files, as described below:
The formats of above three files are the same. There are 4 fields in the file, they are:
Old LUI | New LUI | SUI | String |
Based on these three files, the Lexical Systems Group tries to analyze the causes of the change, fix bugs and enhance features of the luiNorm flow.
II. Analysis
The change of LUI (luiNorm form) could be caused by the change of software algorithm or Lexicon data. We would like to know as much detail as possible to make sure luiNorm behaves the way we expect. The analysis is straight forward. Basically, following steps are used to identify which flow component cause the change:
Tag | Condition |
---|---|
S | Same luiNorm forms |
C | Change in luiNorm forms |
Flow component | Tag | Cause | Testing Flows | Prev Flows |
---|---|---|---|---|
-f:q7 | q7 | Unicode Core Norm | -f:T:q7 | -f:T:q:q2 |
-f:g | g | Remove genitive | -f:g | -f:g |
-f:rs | rs | Remove parenthetical plural forms | -f:rs | -f:rs |
-f:o | o | Remove punctuation | -f:o | -f:o |
-f:t | t | Remove stopWords | -f:t | -f:t |
-f:l | l | Lowercase | -f:l | -f:l |
-f:B | B | Retrieve the uninflected form | -f:B | -f:B |
-f:C | C | Retrieve the Canonical form | -f:C | -f:C |
-f:q8 | q8 | Strip or Map Unicode to ASCII | -f:q8 | -f:g4 |
-f:w | w | Sort words by order | -f:w | -f:w |
As mentioned above, change might be caused by the change of software algorithm or Lexicon data. They are discussed as follows:
In 2008, luiNorm is enhanced to 10 flow components as follows:
Try to compare the input and output of the same flow components.
Flow Component | Tag | Detail Cause |
---|---|---|
-f:E | CE | New EUI, lexical records |
-f:s -CR:o | Cs | New Spelling variants |
-f:C | C | New words in UMLS/Lexicon or new rules in Lexicon |
III. Procedures: Run the Analysis
IV. Reports