Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Comparison on Optimized Set between 2014 - 2021

I. New SD-Rules Evaluation Results:

Releases applied this approach to retrieve the optimized SD-rule set are copared as follows since 2014:

ReleaseNew SD-RulesBaselineResultsNotes
2014First Release (based on 2013 SD-Rule)
  • Total candidates SD-pairs: 43,375
  • Total valid candidates SD-pairs (SD-Facts: relevant): 37,136
  • N/A (All SD-Rules are first timer)
2015Added 15 new SD-Rules to the previous release
  • Total candidates SD-pairs: 53,905
  • Total valid candidates SD-pairs (SD-Facts: relevant): 46,950
  • 2 are duplicated (child rule of existing rules).
  • 11 (84.62%, 11/13) of them are evaluated as good rules in the optimized set
  • 2 (15.38%, 2/13) are bad rules
2016Added 12 new SD-Rules to the previous release
  • Total candidates SD-pairs: 58,422
  • Total valid candidates SD-pairs: 50,814
  • 1 are duplicated (of existing rules).
  • 8 (72.73%, 8/11) of them are evaluated as good rules in the optimized set
  • 3 (27.27%, 3/11) are bad rules
2017Added 11 new SD-Rules to the previous release
  • Total candidates SD-pairs: 59,850
  • Total valid candidates SD-pairs: 51,788
  • 1 are duplicated (of existing rules).
  • 6 (60.00%, 6/10) of them are evaluated as good rules in the optimized set
  • 4 (40.00%, 4/10) are bad rules
2020Added 18 new SD-Rules to the previous release
  • Total candidates SD-pairs: 61,777
  • Total valid candidates SD-pairs: 53,440
  • 7 are duplicated (of existing rules).
  • 7 (63.63%, 7/11) of them are evaluated as good rules in the optimized set
  • 4 (36.36%, 4/11) are bad rules
2021Proposed 21 new SD-Rules to the previous release
  • Total candidates SD-pairs: 63,712
  • Total valid candidates SD-pairs: 54,421
  • 3 are duplicated (of existing rules).
  • 12 (66.67%, 12/18) of them are evaluated as good rules in the optimized set
  • 6 (33.33%, 6/18) are bad rules

II. Comparison of SD-Rule set:

YearStatsOptimized Diagram
2014
  • Baseline Set (include parent-child rules): 107
  • Total Unique Rules: 96
  • Total Good Rules: 73
  • Total Valid SD-pairs (SD-Facts: Relevant): 42,552
  • Opti. System Precision: 95.30%
  • Opti. System Recall: 95.01%
  • Opti. System Performance: 1.9031
  • Cutoff Rule: ar$|adj|e$|noun
  • Optimized Set: 2014 Optimized Set
2015
  • Baseline Set (include parent-child rules):120
  • Total Unique Rules: 101
  • Total Good Rules: 76
  • Total Valid SD-pairs (SD-Facts: Relevant): 46,950
  • Opti. System Precision: 95.22%
  • Opti. System Recall: 95.70%
  • Opti. System Performance: 1.9093
  • Cutoff Rule: ar$|adj|e$|noun
  • Optimized Set: 2015 Optimized Set
2016
  • Baseline Set (include parent-child rules):132
  • Total Unique Rules: 111
  • Total Good Rules: 82
  • Total Valid SD-pairs (SD-Facts: Relevant): 50,814
  • Opti. System Precision: 95.00%
  • Opti. System Recall: 95.26%
  • Opti. System Performance: 1.9026
  • Cutoff Rule: $|noun|ist$|noun
  • Optimized Set: 2016 Optimized Set
2017
  • Baseline Set (include parent-child rules):142
  • Total Unique Rules: 119
  • Total Good Rules: 86
  • Total Valid SD-pairs (SD-Facts: Relevant): 51,788
  • Opti. System Precision: 95.09%
  • Opti. System Recall: 94.92%
  • Opti. System Performance: 1.9001
  • Cutoff Rule: $|noun|ist$|noun
  • Optimized Set: 2017 Optimized Set
2020
  • Baseline Set (include parent-child rules):153
  • Total Unique Rules: 130
  • Total Good Rules: 93
  • Total Valid SD-pairs (SD-Facts: Relevant): 53,440
  • Opti. System Precision: 95.00%
  • Opti. System Recall: 94.48%
  • Opti. System Performance: 1.8948
  • Cutoff Rule: ar$|adj|e$|noun
  • Optimized Set: 2020 Optimized Set
2021
  • Baseline Set (include parent-child rules):170
  • Total Unique Rules: 148
  • Total Good Rules: 104
  • Total Valid SD-pairs (SD-Facts: Relevant): 54,421
  • Opti. System Precision: 95.12%
  • Opti. System Recall: 93.45%
  • Opti. System Performance: 1.8857
  • Cutoff Rule: ctic$|adj|xis$|noun
  • Optimized Set: 2021 Optimized Set

For the Optimial set:

  • The optimized set is similar between releases of 2014 and 2015, please see SD-Rule rank mapping, 2014-15 for details.
  • The optimized set (good rules stay good) are consistent over the years:
    • 2014 optimal set has 96 SD-Rules, 73 of them are good.
    • 2015 optimal set has 101 SD-Rules, 76 of them are good.
    • 2016 optimal set has 111 SD-Rules, 82 of them are good.
    • 2017 optimal set has 119 SD-Rules, 86 of them are good.
    • 2020 optimal set has 130 SD-Rules, 93 of them are good.

    • All good rules in 2014 are good in 2015.
    • All good rules in 2015 are good in 2016, except for 1 (ar$|adj|e$|noun).
    • All good rules in 2016 are good in 2017.
    • All good rules in 2017 are good in 2020.

III. Transaction History:

Baseline
Collected Candidate SD-Rules
Unique Rules
Remove child-rules from Baseline
Good Rules
Used in Lexical Tools SD-Rule set
2014107 96
  • removed 11 child-rules from baseline
  • 96 = 107 - 11
73
New Rules15
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules76215
    Duplicated2002
    Total non-dul-rules56213
    Bad Rules2002
    Good Rules36211
  • details
2015120
  • 2 new rules out of 15 are child-rules of existing rules, not added
  • 120 = 107 + 15 - 2
101 76
  • 4 of good new rules are parent-rules of 4 existing rules (+0)
  • 2 of good new rules are parent-rules of 4 existing rules (-2)
  • 5 of good new rules have no parent-rules relationship with existing rule (+5)
  • 76 = 73 + 0 - 2 + 5
New Rules12
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules25512
    Duplicated0101
    Total non-dup-rules24511
    Bad Rules1113
    Good Rules1348
  • details
2016132
  • 1 existing rule add child-rule nce$|noun|nt$|adj in 2015
  • 1 new rules of out 12 is duplicated, not added
  • 132 = 120 + 1 + 12 -1
111 82
New Rules11
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules25411
    Duplicated0100
    Total non-dup-rules24410
    Bad Rules2024
    Good Rules0426
  • details
2017142
  • 1 new rules of out 11 is duplicatedm not added
  • 142 = 132 + 11 -1
119 86
New Rules11
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules210618
    Duplicated0527
    Total non-dup-rules25411
    Bad Rules2024
    Good Rules0527
  • details
2020153
  • 7 new rules of out 18 is duplicatedm not added
  • 153 = 142 + 18 - 7
130 93

The Trascation history is not tracked after 2021+ release.

Details:

The conclusion is the optimized set of SD-Rules is very steady (consistent) as we expected.