Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Local Optimization - Evaluate PARENT rules and their CHILD rules
The SD-Rule set includes the latest SD rules that are used to generate SD pairs in the Lexicon (data - dm.data). This set include some PARENT and CHILD SD-rules that need to be evaluated and choose the one(s) with best performance (F1) as an optimized SD-Rule set (dm.rul) to be used in the Lexical Tools. This page describes the details on the evaluation and optimization procedures as follows:
I. Identify all rules for evaluation - PARENT, NEW, and previous better Rules
shell> cd ${SUFFIXD_DIR}/data/${YEAR}/dataR/SdRulesCheck/
shell> mkdir decompose.40.25
(40: min. local occurrence rate, 25: min. local cover-recall rate)
shell> ln -sf ./decompose.40.25 decompose
shell> sort -u ../../../data/suffixD.yesNo.data > ./suffixD.yesNo.data.uSort
shell> flds 1,2,4,5,7 ./suffixD.yesNo.data.uSort > suffixD.yesNo.data.uSort.1.2.4.5.7
shell> ln -sf ./suffixD.yesNo.data.uSort.1.2.4.5.7 sdPairs.data
suffix-1|pos-1|suffix-2|pos-2
: remove the rest of the line
esis$|noun|ic$|adj
in if this rule is not there. (This rule was evaluated with better F1 with CHILD from previous experience)
II. Decompose CHILD rules on identified rules
7
to retrieve all good candidate CHILD rules.
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
7
40 (min. occurrence rate - for decompose)
25 (35) (min. coverage rate - for candidate child)
Child rule must have higher accuracy rate (precision) than the root parent-rule and meets the min. coverage rate (recall, default is 25%). Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
shell>mv sdRules.decompose.out sdRules.decompose.out.${NO}.${RULE}
shell>mv sdRules.decompose.out sdRules.decompose.out.01.X-ally
III. Optimize SD rule ste by evaluating and selecting the best PARENT and CHILD rules
go through all decomposed CHILD rules from above steps (./daaR/SdRulesCheck/decompose/).
shell>mkdir ${NO}.${RULE}
shell>mkdir 01.X-ally
shell>cd 01.X-ally
shell>cp -p ../00.baseline/sdRules.stats.in sdRules.stats.in.testing
shell>ln -sf ./sdRules.stats.in.testing sdRules.stats.in
<= Candidate
from ../../SdRulesCheck/decompose/sdRule.decompose.out.1.X-ally to this file under the associated PARENT rule
#====================================================================== # Rank|Precision|Occurrence|Yes|No|Tbd|SD-Rule|YEAR|SOURCE|RELATIONSHIP #====================================================================== #25|98.99%|2086|2065|21|0|$|adj|ally$|adv|2015|ORG_FACT|PARENT 251|99.95%|1966|1965|1|0|c$|adj|cally$|adv|2024|DECOMPOSE|CHILD #252|99.95%|1961|1960|1|0|ic$|adj|ically$|adv|2024|DECOMPOSE|CHILD
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
1
others
01.X-ally
59911
<= total Yes from baseline (change on annaully evaluation)
-- Optimum SD-Rules: 102|73.13%|67|49|18|0|$|verb|per$|noun|2024|WORDNET|SELF|95.29%|87.08%|1.8237|52168|54747
shell> cp -p sdRules.stats.in.testing sdRules.stats.in.01.1
shell> mv sdRules.stats.out.html sdRules.stats.out.01.1.html
shell> cp -p sdRules.stats.out.01.1.html ${WEB_LVG}/docs/designDoc/UDF/derivations/SD-Rules-Opti/Ex-${YEAR}/.
IV. Optimize SD rule set Results
Please refer to the result of optimization log for details of each step for these parent-child rules optimization processes.
The result of the final optimized set of SD-Rules includes 162 unique parents/self/child SD-Rules. They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 105 SD-Rules are used as the optimized SD-Rule set to cover 95.00% system (accumulated) precision and 87.57% system (accumulated) recall rate with a system performance of 1.8288. The total valid instance number is 59911.
- Total line no: 229 -- Total comment no: 67 -- Total Sd-Rule no: 162 --------------------------------------- -- Optimum SD-Rules: 105|73.17%|41|30|11|0|e$|noun|ery$|noun|2013|ORG_RULE|SELF|95.31%|87.57%|1.8288|52464|55048
V. POST-Process: generate SD Rule Trie
Generate SD-Rule trie from this 105/162 optimized set for Lexical tools SD-Rule generation.
cd ./dataR
cp ./37.ity-y/sdRules.stats.out ./37.ity-y/sdRules.stats.out.opti
ln -sf ./SdRulesOptimum/37.ity-y/sdRules.stats.out.opti sdRules.stats.out
shell> cd ./bin
shell> GetSdRule ${YEAR}
8
105
(the good rules)