Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

SD-Rule Transaction Details: 2015 to 2016

The detail transaction of SD-Rules are described as below:

  • The following table shows the transcation on the 12 new propsoed SD-Rules in 2016.

    Computer Generated SD-Rules
    IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2016TypeCount ChangeAccu. Count
    01-CG1e$|verb|is$|nounnomDGood34: ose$|verb|osis$|noun27: se$|verb|sis$|noun Parent-1-Child+075*
    02-CG2sia$|noun|tic$|adjorgDGoodNone40: sia$|noun|tic$|adjNew in 2016+176
    03-CG3on$|noun|ve$|adjorgDGoodNone48: on$|noun|ve$|adjNew in 2016+177
    04-CG4e$|noun|ic$|adjorgDGoodNone49: e$|noun|ic$|adjNew in 2016+178
    05-CG5$|adj|ism$|nounnomDGoodNone51: $|adj|ism$|nounNew in 2016+179
    06-CG6ation$|noun|ed$|adjnomDGoodNone67: ation$|noun|ed$|adjNew in 2016+180
    07-CG7$|noun|ship$|nounorgDGoodNone70: $|noun|ship$|nounNew in 2016+181
    08-CG8e$|adj|ion$|nounnomDBadNone88: e$|adj|ion$|nounNew in 2016+081
    09-CG9$|noun|age$|nounorgDBadNone96: $|noun|age$|nounNew in 2016+081
    10-CG10e$|verb|ing$|nounnomDGood44: e$|verb|ing$|noun47: e$|verb|ion$|nounDuplicate+081
    Expert-Suggested SD-Rules
    11-ES1esis$|noun|ic$|adjExpertsGoodNone13: genesis$|noun|genic$|adjNew in 2016+182
    12-ES2al$|adj|ine$|nounExpertsBadNone98: al$|adj|ine$|nounNew in 2016+082

    * 75 out of 76 good SD-Rules in 2015 are evaluated as good rules in 2016. They could be identical, or replaced by the parent-rules or child-rules. Only the least rank (76) from the previous optimal set, ar$|adj|e$|noun, is evaluated as bad rule in 2016 release.

  • Good SD-Rules count in Optimal Set:
    • 2015 has 76 good rules while 2016 has 82 good rules in optimate set:
    • From the evaluation, 8 of 12 new rules are good (3 bad; 1 duplicated). Why is the total number of good SD-Rule only increased by 6 (from 76 to 82), not 84 (76 + 8)? It is because:
      • 1 of 2015 good rule is below the cutoff and become bad rule (-1).
      • 1 of good new rules is the parent-rule of 1 existing rules (+0).
      • 7 new rules have no parent-child relationshion with existing rule (+7)

      • So, tolal change is 7-1 = 6.

  • Good Rules comparison (2015-2016):
    Type20152016Details
    No Change7474...
    Good Rule turn bad10ar$|adj|e$|noun
    Parent-1-Child11
    20152016
    34: ose$|verb|osis$|noun27: se$|verb|sis$|noun
    New in 201607
    • 13: genesis$|noun|genic$|adj
    • 40: sia$|noun|tic$|adj
    • 48: on$|noun|ve$|adj
    • 49: e$|noun|ic$|adj
    • 51: $|adj|ism$|noun
    • 67: ation$|noun|ed$|adj
    • 70: $|noun|ship$|noun
    Total7682 

  • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive (time comsuming) to evaluate all parent-child rules. Shoule we modify the processes as:
    • Normalize all SD-Rules to it's root-parent-rule.
    • Analyze parent-child-hieracy for all SD-Rules.

    in 2016, we spent ~ 2 weeks to evaluated 16 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?