Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
TC Package - PreProcess Procedures
This preprocess should be perform after the baseline software of new release is completed. Please follow the annual release procedures for a new release. The preprocess procedures for generating files for JDI, STI, and STRI are detailed in this page. Please refer to PreProcess Design & Requirements section for design details.
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE SerialsSet PUBLIC "-//NLM//DTDSERIALS, 1st January 2010//EN"
"http://www.nlm.nih.gov/databases/dtd/nlmserials_100101.dtd">
- shell> cd ${TC}/preProcess/tcPre2008/bin
- shell> 0.GetMedLineFiles
2010
2010
${YEAR}
11
- shell> cd ${TC_DIR}/tcPre2008/data/2010/Jdi
- shell> cp -rp Output Output.tc${YEAR}
- shell> 2.DeployJdiFilesToTc
${YEAR}
word | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
risk | 464482 | ||||
cancer | 388950 | 645291 | 705814 | 754647 | 792053 |
blood | 510753 | 608233 | 629776 | 644743 | 671190 |
therapy | 444975 | 645880 | 682715 | 695532 | 713875 |
function | |||||
case | 430815 | 699541 | 723212 | 756545 | |
Max. Signal | 510754 | 645881 | 705815 | 754648 | 792054 |
- shell> 1.TestJdi (15 min.)
previous year
current year
7
Releases | WordJdidWc | WordJdidDc | MhJdidDc | ShJdidDc |
---|---|---|---|---|
2008~2009 | 97.08% | 97.69% | 99.04% | 99.99% |
2009~2010 | 96.37% | 97.01% | 98.64% | 99.82% |
2010~2011 | 96.49% | 97.10% | 98.68% | 99.76% |
From JDI:
${TC_VERSION}
${DATA_YEAR}
10
input Max Signal
shell> 4.Deploy1stRunStriFilesToTc ${YEAR}
${TC_YEAR}
${DATA_YEAR}
11
shell> 5.RefineStDoc
${YEAR}
${YEAR}
1
1
2
humn
1
1
...
-- RefineStDocuments.RefineStDocuments(): humn, word Size: 28
1. applicant|humn|T016|8|0.5825909|false|0.6438478(0.8119695-0.16812167)
2. applicants|humn|T016|4|0.57193226|false|0.67198503(0.82519585-0.15321079)
3. delegate|humn|T016|6|0.52379584|false|0.61971915(0.77649516-0.15677604)
4. descendent|humn|T016|88|0.32893714|false|0.51979005(0.64032584-0.12053579)
5. human|humn|false
6. human|humn|false
7. human|humn|false
8. human|humn|false
9. human|humn|false
10. humans|humn|T016|93|0.44167516|false|0.676633(0.82196045-0.14532742)
11. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
12. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
13. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
14. interviewee|humn|T016|24|0.4677229|false|0.61210704(0.80709076-0.19498374)
15. invoker|humn|false
16. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
17. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
18. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
19. owner|humn|T016|4|0.63383675|false|0.75085104(0.8832318-0.13238078)
20. owner|humn|T016|4|0.63383675|false|0.75085104(0.8832318-0.13238078)
21. producer|humn|T016|92|0.2529137|false|0.7038802(0.8824603-0.17858009)
22. recipient|humn|T016|105|0.11818392|false|0.39545894(0.4790922-0.083633274)
23. resident|humn|T016|69|0.5160713|false|0.630488(0.7693282-0.1388402)
24. sponsor|humn|T016|65|0.36086074|false|0.60545766(0.75192285-0.14646521)
25. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
26. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
27. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
28. user|humn|T016|39|0.25950813|false|0.7680406(0.8928807-0.1248401)
...
- shell> cd ${TC}/tc${YEAR}/bin/loadDb/
- shell> 2.AnalyzeInFiles ${YEAR}
- shell> cd ${TC}/tc${YEAR}/
- shell> ./bin/loadDb/3.LoadDb
4) Word-St Scores
shell> cd ${TEST}/TC/WsdTest/
shell> ${TEST}/TC/WsdTest/bin/2.TestWsd
shell> ${TEST}/TC/WsdTest/bin/3.TestWsdStats
shell> ${TEST}/TC/WsdTest/bin/4.TestAll
ST WSD Collections Tests (both train and test sets):
TC Version | Ambiguous Sentence | Ambiguous Sentences | Ti-AB | ||||||
---|---|---|---|---|---|---|---|---|---|
DC | WC | CS | DC | WC | CS | DC | WC | CS | |
2007 | 74.61% | 75.00% | 74.91% | 74.95% | 75.39% | 75.05% | 74.05% | 74.32% | 74.32% |
2008 | 73.81% | 74.93% | 74.36% | 74.30% | 75.00% | 74.77% | 73.52% | 74.44% | 74.01% |
2009 | 77.37% | 77.11% | 76.91% | 76.79% | 76.72% | 76.62% | 76.13% | 76.65% | 76.12% |
2010 | 76.62% | 77.36% | 77.27% | 75.96% | 76.59% | 76.73% | 74.85% | 76.38% | 75.24% |
2011 | 77.11% | 77.53% | 77.24% | 76.00% | 77.10% | 76.49% | 74.82% | 76.81% | 75.55% |
shell> cd ${TEST}/TC/WsdTest2/
shell> ${TEST}/TC/WsdTest2/bin/2.TestWsd
shell> ${TEST}/TC/WsdTest2/bin/3.TestWsdStats
shell> ${TEST}/TC/WsdTest2/bin/4.TestAll
MSH WSD Set Tests:
The precision excludes answer can not be found by StWSD:
Precision/Weighted Precision Test for MSH WSD set (both ambiguous abbreviatons and ambiguous terms):
TC Version | Ambiguous Sentence | Ambiguous Sentences | Ti-AB | ||||||
---|---|---|---|---|---|---|---|---|---|
DC | WC | CS | DC | WC | CS | DC | WC | CS | |
2007 | 70.66% 71.90% | 70.58% 72.19% | 70.70% 72.13% | 70.56% 71.34% | 70.59% 71.40% | 70.58% 71.56% | 70.79% 70.84% | 70.76% 71.31% | 70.79% 70.98% |
2008 | 70.42% 70.88% | 70.49% 71.33% | 70.48% 71.02% | 69.85% 70.63% | 70.09% 71.27% | 70.06% 71.08% | 69.54% 69.79% | 69.30% 69.67% | 69.23% 69.57% |
2009 | 66.63% 66.91% | 66.21% 66.83% | 66.44% 66.72% | 66.46% 67.14% | 65.79% 66.47% | 64.23% 66.74% | 66.93 66.96% | 66.36% 66.56 | 66.78% 66.81% |
2010 | 65.86% 65.62% | 65.69% 66.05% | 65.72% 65.92% | 65.62% 65.96% | 65.42% 65.93% | 65.58% 66.03% | 66.12% 65.73% | 65.83% 65.85% | 66.05% 65.83% |
2011 | 67.09% 66.64% | 66.76% 66.93% | 67.00% 66.76% | 66.90% 66.43% | 66.89% 67.21% | 66.64% 66.55% | 67.20% 66.35% | 67.06% 66.67% | 67.05% 66.34% |