Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

SD-Rule Transaction Details: 2014 to 2015

The detail transaction of SD-Rules are described as below:

  • Baseline of candidate SD-Rule count:
    • 2014 baseline collects 107 SD-Rules.
    • 2015 baseline collects 120 SD-Rules, by adding 15 new SD-Rules from 107 collected Sd-Rules in 2014. Two of them are duplciates because they are child-rules (120 = 107 + 15 - 2)
    • The baseline set is processed to removed duplicates of parent-child relationship. In 2015, 19 child-rules from 120 baseline SD-Rules set are remove to have 101 unique SD-Rules, (120 - 19 = 101).

  • Good SD-Rules count in Optimal Set:
    • 2014 has 73 good rules while 2015 has 76 food rules in optimate set:
    • All 73 good SD-Rules in 2014 are good rules in 2015. They could be identical, or replaced by the parent-rules or child-rules.
    • From the evaluation, 11 of 15 new rules are good. Why is the total number of good SD-Rule only increased by 3 (from 73 to 76), not 84 (73 + 11)? It is because:
      • 4 of new rules are parent-rules of 4 existing rules (+0).
      • 2 of new rules are parent-rules of 4 exsiting rules (-2).
      • 5 new rules have no parent-child relationshion with existing rule (+5)

      • So, tolal change is 5-2 = 3.
      This involved complicated child-parent rules situation, please see SD-Rule rank mapping for details. They are summarized as detail below:

      Type20142015Details
      No Change6565...
      Parent-1-Child44
      20142015
      02: ability$|noun|able$|adj09: ility$|noun|le$|adj
      08: ic$|adj|ically$|adv15: $|adj|ally$|adv
      21: ency$|noun|ent$|adj19: cy$|noun|t$|adj
      55: ion$|noun|ional$|adj70: $|noun|al$|adj
      Parent-2-Child42
      20142015
      16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      18: nce$|noun|nt$|adj
      10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      20: e$|verb|ion$|noun
      New in 201505
      • 02: se$|verb|zation$|noun
      • 03: sation$|noun|ze$|verb
      • 45: e$|verb|ing$|noun
      • 61: al$|adj|us$|noun
      • 67: es$|noun|ic$|adj
      Total7376 

    • The following table shows the transcation on the 15 new propsoed SD-Rules in 2015.

      Computer Generated SD-Rules
      IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2014TypeCount ChangeAccu. Count
      01-CG1se$|verb|zation$|nounnomDGood02: se$|verb|zation$|nounNoneNew in 2015+174
      02-CG2sation$|noun|ze$|verbnomDGood03: sation$|noun|ze$|verbNoneNew in 2015+175
      03-CG3ility$|noun|le$|adjnomDGood09: ility$|noun|le$|adj02: ability$|noun|able$|adjParent-1-Child+075
      04-CG4$|adj|ally$|advorgDGood15: $|adj|ally$|adv08: ic$|adj|ically$|advParent-1-Child+075
      05-CG5nce$|noun|nt$|adjnomDGood18: nce$|noun|nt$|adj 16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      Parent-2-child-174
      06-CG6cy$|noun|t$|adjnomDGood19: cy$|noun|t$|adj21: ency$|noun|ent$|adjParent-1-Child+074
      07-CG7e$|verb|ion$|nounnomDGood20: e$|verb|ion$|noun 10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      Parent-2-Child-173
      08-CG8c$|adj|s$|nounorgDGood43: ic$|adj|is$|noun41: ic$|adj|is$|nounChild+073
      Expert-Suggested SD-Rules
      09-ES1e$|verb|ing$|nounExpertsGood45: e$|verb|ing$|nounNoneNew in 2015+174
      10-ES2al$|adj|us$|nounExpertsGood61: al$|adj|us$|nounNoneNew in 2015+175
      11-ES3es$|noun|ic$|adjExpertsGood67: es$|noun|ic$|adjNoneNew in 2015+176
      12-ES4$|noun|ize$|verbExpertsBad78: $|noun|ize$|verbNoneNew+076
      13-ES5es$|noun|ic$|nounExpertsBad101: es$|noun|ic$|nounNoneNew+076
      14-ES6ian$|adj|ia$|nounExpertsGood57: a$|noun|an$|adj53: a$|noun|an$|adjDuplicated-Child+076
      15-ES7ian$|noun|ia$|nounExpertsBad99: a$|noun|an$|noun93: a$|noun|an$|nounDuplicated-Child+076

    • In the evaluation process, we removed two proposed new rules (ES-6 and ES-7) because they are child rules of existing rules. After the normalization (alphabetic order and use root-parent-rule), they are duplicated rules. Thus, we did not anlyze the parent-child hierachy on these two rules. Should we analyze them in the future releses?
    • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive. Shoule we modify the processes as:
      • Normalize all SD-Rules to it's root-parent-rule.
      • Analyze parent-child-hieracy for all SD-Rules.

      in 2015, we have 14 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!
    • 2015 has 10 more root parent rules.

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?