Option | Description | input | Output | Notes | Option
|
---|
70 |
- Get Antonyms from MEDLINE 3-grams by a specify middle keyword (and/or):
- Medline.GetAntCandFrom3GramPatMid.java
|
- ${ML_DIR}/input/3-gram.${ML_YEAR}.30.core
- ${META_DIR}/input/normTermCui.data
- ${META_DIR}/input/MRSTY.RRF
- ${LEX_DIR}/input/inflVars.data
- ${LEX_DIR}/input/synonym.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
|
- ./output/PreCand/antCandPatMid.andOr.data
|
- This step is not used in the annual processes. But, it is used to debug one keyWord in the step-71.
- This step is used to pre-run Step-71 by using 1 middle word in 3-grams to get collocates for antonyms. Must run this to make sure everything is OK before running Step-71.
- If run the 1st time:
- shell> mkdir ./output/PreCand
- make sure all input files are setup correctly
- Different versions of data are used due to different released dates of data:
- Lexicon Antonym release: ${YEAR}
- META-thesaurus: ${PREV_YEAR}AA
- MEDLINE: ${PREV_YEAR}
- LVG: ${PREV_YEAR}
- This program set the defaults keyword to "and/or".
| 70
|
71 |
- Get Antonyms from MEDLINE 3-grams by specify middle keywords
- Medline.GetAntCandFrom3GramPatMid.java
|
- ${ML_DIR}/input/3-gram.${YEAR}.30.core
- ${META_DIR}/input/normTermCui.data
- ${META_DIR}/input/MRSTY.RRF
- ${LEX_DIR}/input/inflVars.data
- ${LEX_DIR}/input/synonym.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
|
- ./output/PreCand/antCandPatMid.${KEY_WORD}.data
|
- Currently, this program includes the top 9 highest frequency keywords: [and], [or], [to], [versus], [than], [vs], [from], [nor], [and|or], as defined in the scripts.
- The latest data are used with different version, because of different released dates of data:
- Lexicon Antonym release: ${YEAR}
- Lexicon: ${YEAR}
- META-thesaurus: ${PREV_YEAR}AA
- MEDLINE: ${PREV_YEAR}
- LVG: ${PREV_YEAR}
| 71
|
72 |
- Get Antonyms from MEDLINE 5-grams by specify middle keywords
- Medline.GetAntCandFrom5GramPatMid.java
|
- ${ML_DIR}/input/5-gram.${YEAR}.30.core
- ${META_DIR}/input/normTermCui.data
- ${META_DIR}/input/MRSTY.RRF
- ${LEX_DIR}/input/inflVars.data
- ${LEX_DIR}/input/synonym.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
|
- ./output/PreCand/antCandPatMid.${KEY_WORD}.data
|
- Currently, this program includes the 1 keyword: "as well as", as defined in the scripts.
- The latest data are used with different version, because of different released dates of data:
- Lexicon Antonym release: ${YEAR}
- Lexicon: ${YEAR}
- META-thesaurus: ${PREV_YEAR}AA
- MEDLINE: ${PREV_YEAR}
- LVG: ${PREV_YEAR}
| 72
|
|
75 |
- Get antCand by combining results from above steps: 71 and 72
- Medline.CombineAntCandFrom3GramPatMid.java
- Medline.CombineAntCandFrom5GramPatMid.java
|
- ./output/PreCand/antCandPatMid.${KEY_WROD}.data.wc
- ./output/PreCand/keyWords.data
|
- ./output/PreCand/antCandPatMid.cand.data.raw
=> include raw co-occurrences that happen once in 1 of 10 keywords
- ./output/PreCand/antCandPatMid.cand.data.filtered
Heuristic filter rules:
=> include filtered co-occurrences: happen in 3 of 9 keywords, not include "other|E0044444", and not self-aPairs
=> is the sum of files: tag + tbd
- ./output/Cand/antCandPatMid.cand.data.tag
- ./output/candTagged/antCandPatMid.cand.data.tag.CC
- ./output/candTagged/antCandPatMid.cand.data.tag.tagged
- ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd
|
- If run the first time:
- shell> mkdir Cand
- shell> mkdir candTagged
- copy ${PreCand}/keyWords.data from ${PREV_YEAR}
- TBD should be 0
- If not, copy ./Cand/antCandPatMid.cand.data.tbd antCandPatMid.cand.data.tbd.${YEAR}.${NO}
- send cand ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO} to linguists to tag
- put tagged file at ./Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged
| 75
|
76 |
- Validate and fix tags of antonym candidates (CC)
- Antonym.ValidateTaggedCand.java
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged
- ${ANT_DIR}/input/domain.data
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.fixed
|
- Prepare/add tagged candidates to ./candTagged/tagged.data.tag.tagged
- copy ./Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged ./candTagged/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged
- convert tagged candidate file to standard format:
shell> flds 3,4,5,6,7,8,9,10,11,12 antCandPatMid.cand.data.tbd.{YEAR}.${NO}.tagged > antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12
- append
antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12 to antCandPatMid.data.tag.tagged.${YEAR}.${NO}
- sort -u antCandPatMid.data.tag.tagged.${YEAR}.${NO} > antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort
shell> cp -p antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort antCandPatMid.data.tag.tagged
- run this step (76) until tag and fixed files are the same
- Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
- Manually copy the fixed file to tagged file, then run it again until they are the same
- Manually copy antCandPatMid.data.tag.tagged to antCandPatMid.data.tag.tagged.${YEAR}
| 76
|
77 |
- Update release antonyms tagged file form CC
- Antonym.UpdateAllTaggedFile.java
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged.${YEAR}
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
|
- ${ANT_DIR}/input/antCand.data.tag.updated
|
- This step auto-update all antonym candidate tag file
- Manully copy antCand.data.tag.updated to antCand.data.tag.updated.CC
- Manully copy antCand.data.tag.updated to antCand.data.tag.${YEAR}
- The output file is used to generate antonym and negation files for the release.
- Re-run steps 75-77 until it passes all steps
- Re-run 75-77 to gen the latest aPair candidate list for linugists
| 77
|