Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Antonym Generation for CC Model

shell>cd ${ANTONYM_DIR}/bin
shell>GetAntonyms ${YEAR}

CC model: co-occurrence in a corpus

Use the latest MEDLINE -N-gram Set, the Lexicon, STMT

OptionDescriptioninputOutputNotesOption
70
${PREV_YEAR}AA
${PREV_YEAR}
  • Get Antonyms from MEDLINE 3-grams by a specify middle keyword (and/or):
  • Medline.GetAntCandFrom3GramPatMid.java
  • ${ML_DIR}/input/3-gram.${ML_YEAR}.30.core
  • ${META_DIR}/input/normTermCui.data
  • ${META_DIR}/input/MRSTY.RRF
  • ${LEX_DIR}/input/inflVars.data
  • ${LEX_DIR}/input/synonym.data
  • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
  • ${ANT_DIR}/input/domain.data
  • ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
  • ./output/PreCand/antCandPatMid.andOr.data
  • This step is not used in the annual processes. But, it is used to debug one keyWord in the step-71.
  • This step is used to pre-run Step-71 by using 1 middle word in 3-grams to get collocates for antonyms. Must run this to make sure everything is OK before running Step-71.
  • If run the 1st time:
    • shell> mkdir ./output/PreCand
    • make sure all input files are setup correctly
  • Different versions of data are used due to different released dates of data:
    • Lexicon Antonym release: ${YEAR}
    • META-thesaurus: ${PREV_YEAR}AA
    • MEDLINE: ${PREV_YEAR}
    • LVG: ${PREV_YEAR}
  • This program set the defaults keyword to "and/or".
70
71
${PREV_YEAR}AA
${PREV_YEAR}
  • Get Antonyms from MEDLINE 3-grams by specify middle keywords
  • Medline.GetAntCandFrom3GramPatMid.java
  • ${ML_DIR}/input/3-gram.${YEAR}.30.core
  • ${META_DIR}/input/normTermCui.data
  • ${META_DIR}/input/MRSTY.RRF
  • ${LEX_DIR}/input/inflVars.data
  • ${LEX_DIR}/input/synonym.data
  • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
  • ${ANT_DIR}/input/domain.data
  • ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
  • ./output/PreCand/antCandPatMid.${KEY_WORD}.data
  • Currently, this program includes the top 9 highest frequency keywords: [and], [or], [to], [versus], [than], [vs], [from], [nor], [and|or], as defined in the scripts.
  • The latest data are in different version, because of different released dates of data:
    • Lexicon Antonym release: ${YEAR}
    • Lexicon: ${YEAR}
    • META-thesaurus: ${PREV_YEAR}AA
    • MEDLINE: ${PREV_YEAR}
    • LVG: ${PREV_YEAR}
71
72
  • Get Antonyms from MEDLINE 5-grams by specify middle keywords
  • Medline.GetAntCandFrom5GramPatMid.java
  • ${ML_DIR}/input/5-gram.${YEAR}.30.core
  • ${META_DIR}/input/normTermCui.data
  • ${META_DIR}/input/MRSTY.RRF
  • ${LEX_DIR}/input/inflVars.data
  • ${LEX_DIR}/input/synonym.data
  • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
  • ${ANT_DIR}/input/domain.data
  • ${PROJECT_DIR}/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
  • ./output/PreCand/antCandPatMid.${KEY_WORD}.data
  • Currently, this program includes the 1 keyword: "as well as", as defined in the scripts.
  • The latest data are in different version, because of different released dates of data:
    • Lexicon Antonym release: ${YEAR}
    • Lexicon: ${YEAR}
    • META-thesaurus: ${PREV_YEAR}AA
    • MEDLINE: ${PREV_YEAR}
    • LVG: ${PREV_YEAR}
72
75
  • Get antCand by combining results from above steps: 71 and 72
  • Medline.CombineAntCandFrom3GramPatMid.java
  • Medline.CombineAntCandFrom5GramPatMid.java
  • ./output/PreCand/antCandPatMid.${KEY_WROD}.data.wc
  • ./output/PreCand/keyWords.data (copy form ${PREV_YEAR})
  • ./output/PreCand/antCandPatMid.cand.data.raw
    => include raw co-occurrences that happen once in 1 of 10 keywords
  • ./output/PreCand/antCandPatMid.cand.data.filtered
    Heuristic filter rules:
    => include filtered co-occurrences: happen in 3 of 9 keywords, not include "other|E0044444", and not self-aPairs
    => is the sum of files: tag + tbd
  • ./output/Cand/antCandPatMid.cand.data.tag
  • ./output/candTagged/antCandPatMid.cand.data.tag.CC
  • ./output/candTagged/antCandPatMid.cand.data.tag.tagged
  • ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd
  • If run the first time:
    • shell> mkdir Cand
    • shell> mkdir candTagged
    • copy ${PreCand}/keyWords.data from ${PREV_YEAR}
  • TBD should be 0
  • If not, copy ./Cand/antCandPatMid.cand.data.tbd antCandPatMid.cand.data.tbd.${YEAR}.${NO}
  • send cand ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO} to linguists to tag
  • put tagged file at ./Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged
75
76
  • Validate and fix tags of antonym candidates (CC)
  • Antonym.ValidateTaggedCand.java
  • ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged
  • ${ANT_DIR}/input/domain.data
  • ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.fixed
  • Manually prepare/add tagged candidates to ./candTagged/tagged.data.tag.tagged
    • copy ../../../${PREV_YEAR}/output/candTagged/antCandPatMid.data.tag.tagged.${PREV_YEAR} ./candTagged/antCandPatMid.data.tag.tagged.${PREV_YEAR}
    • copy ./candTagged/antCandPatMid.data.tag.tagged.${PREV_YEAR} ./candTagged/antCandPatMid.data.tag.tagged.${YEAR}.0
    • copy ./Cand/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged ./candTagged/antCandPatMid.cand.data.tbd.${YEAR}.${NO}.tagged
    • convert tagged candidate file to standard format:
      shell> flds 3,4,5,6,7,8,9,10,11,12 antCandPatMid.cand.data.tbd.{YEAR}.${NO}.tagged > antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12
    • append
      > cat antCandPatMid.data.tag.tagged.${YEAR}.${NO}.0 antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12 > antCandPatMid.data.tag.tagged.${YEAR}.${NO}.1
    • sort -u antCandPatMid.data.tag.tagged.${YEAR}.${NO} > antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort
    • shell> cp -p antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort antCandPatMid.data.tag.tagged
  • run this step (76) until tag and fixed files are the same
    • Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
    • If [NEF_TBD] exist, send to Linguist to tag, then fix then.
    • Manually copy the fixed file to tagged file, then run it again until they are the same
  • Manually copy antCandPatMid.data.tag.tagged to antCandPatMid.data.tag.tagged.${YEAR}
76
77
  • Update release antonyms tagged file form CC
  • Antonym.UpdateAllTaggedFile.java
  • ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged.${YEAR}
  • ${ANT_DIR}/input/antCand.data.tag.${YEAR}
  • ${ANT_DIR}/input/domain.data
  • ${ANT_DIR}/input/antCand.data.tag.updated
  • This step auto-update all antonym candidate tag file
  • Manully copy antCand.data.tag.updated to antCand.data.tag.updated.CC
  • Manully copy antCand.data.tag.updated to antCand.data.tag.${YEAR}
  • The output file is used to generate antonym and negation files for the release.
  • Fix all conflicts in step 77:
    • no tag conflict no (must = 0), otherwise sent antCand.data.tag.updated.tagConflict to linguist for fix the conflicts.
  • Re-run steps 75-77 until it passes all steps
  • Re-run 75-77 to gen the latest aPair candidate list for linugists
77