SPECIALIST Lexicon

Antonym - Processes for Annual Release and Stats Reports

base directory: ${ANTONYM_DIR}
binary scripts: ./bin
data: ./data
- 0.Antonym
Pre-requirements:
Must complete updates on aPairs from LEX, SD, PD, (TT), CC, SN

shell>cd ${ANTONYM_DIR}/bin
shell>GetAntonyms ${YEAR}

II. Processes

Generate aPairs, negation cue words, and antonym files

Option

Description

input

Output

Notes

Option

generate aPairs from tagged candidates
Antonym.GenAPairsFromTagCand.java

${ANT_DIR}/input/antCand.data.tag.${YEAR}
${ANT_DIR}/input/domain.data
${LEX_DIR}/input/LRSPL

./output/aPairs.data

This program generates aPairs with all spVars
This program removes duplicated aPairs by spVars from different sources
The result include some duplicated aPairs from the different order of aPairs from different sources. They are taken care of in Step-3.
This is the antonym file contains unique aPairs.
manually copy aPairs.data to aPairs.data.${YEAR}

generate negation cue words from tagged candidates
Antonym.GenNegCueWordsFromTagCand.java

${ANT_DIR}/input/antCand.data.tag.${YEAR}
${LEX_DIR}/input/LRSPL

./output/negCueWords.data

This is the negation cue word file (unique).
manually copy negCueWords.data to negCueWords.data.${YEAR}

Gen antonyms release file from results of step-1 (DB table for Lexical Tools)
Antonym.GenAntFromAPairs.java

./output/aPairs.data.${YEAR}

./output/antonyms.data
./output/antonyms.data.tagConflict
=> Must be 0, if not:
- send ./output/antonyms.data.tagConflict to linguist to tag same aPairs.
- manully fixed in ./input/antCand.data.tag
- re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag conflict is 0
- then fix the tag duplicates.
./output/antonyms.data.tagDuplicate
=> Must be 0, if not:
- The duplicated tag is caused by spVars.
- manually review and fix ./input/antCand.data.tag
- delete the duplicates (by keeping the smaller EUI as EUI-1) search for the EUIs only from the ./input/antonyms.data.tagDuplicate
- re-run step 1,2,3 (update aPairs.data.${YEAR}) until tag duplicates is 0
- then fix the src duplicates.
./output/antonyms.data.srcConflict
- Computer program auto-fixes the src according to the following order (LEX > SD > PD > CC > SN) if the same aPair is tagged from multiple sources
- The fixes is conducted on the input (./output/aPairs.data.${YEAR}) and result in the output (./output/antonyms.data).
- All src conflicts are in the log file ./output/antonyms.data.srcConflict
  =>In general, no action is needed because computer program takes care of conflicts by reassign the src and remove the one not needed. However, we can randomly check the following:
  - review conflcits in the log file ./output/antonyms.data.srcConflict
  - check src conflicts from the source input file ./output/aPairs.data.${YEAR} with multiple sources
  - ensure only 1 src exist in the output file ./output/antonyms.data
  - all fxied conflicts are kept in the known exceptions for references (./output/antonyms.data.srcConflict.${YEAR}.${NO}.known.
  - known source conflicts history:
    
    Year Exception No.
    2023 3
    2024 8
    2025 68
    2026 91

This is the antonym release (also used as the DB table for Lexical tools).
manually copy antonyms.data to antonyms.data.${YEAR}

Get stats on tagged antonym candidate file

${ANT_DIR}/input/antCand.data.tag.${YEAR}

./output/analysis/antCand.data.tag.stats
./output/analysis/domain.out.cand

If run the first time, shell> mkdir ${OUTPUT}/analysis
Generate stats and domains from antonym candidate tagged file

Get stats on canonical antonym from tagged candidate file

${ANT_DIR}/input/antCand.data.tag.${YEAR}

./output/analysis/antCand.data.tag.canon.stats
./output/analysis/domain.out.cand.canon

Generate stats and domains from canonical antonym in tagged file

Get stats on antonym file

./output/antonyms.data

./output/antonyms.data.2-10
./output/analysis/antonym.data.stats
./output/analysis/domain.out.antonym

Generate stats and domains from antonym file
This file is used to update antonym growth.

The SPECIALIST Lexicon