Step | Descrption | Inputs | Outputs | Notes
|
---|
Pre-Process:
|
0 | - Update the latest valid and invalid LMW list
| | | - Update candidates
- Run ${LMW_DIR}/bin/00.CandidateList, steps 1-4
=> Setup: must link the latest Lexicon and inflVars from LexBuild daily backup to ${LMW_DIR}/data/current/inData/.
=> After run 00.CandidateList, two files used in the steps belows are auto-updated:
- ${LMW_DIR}/data/current/inData/notLmw.data.current -> ${LMW_DIR}/data/Candidates/totalTerms.all.lmw.no
- ${LMW_DIR}/data/current/inData/notBase.data.current -> ${LMW_DIR}/data/Candidates/totalTerms.all.base.no
|
Process:
|
1 | Generate candidate list from Abb/Acr expansion
| - ${IN_DIR}/LEXICON (input)
- ${IN_DIR}/inflVars.data (valid LMWs)
- ${CUR_DIR}/notBase.data.current
=> linked to ${LMW_DIR}/data/Candidates/totalTerms.1_2.base.no
=> auto updated after run ${LMW_DIR}/bin/00.CandidateList, steps 1-4
- ${LMW_DIR}/data/${YEAR}/inData/abbAcrExpansions.data.hasEui.Exception.${YEAR} (modified from the prev year)
| - abbAcrExpansions.tag (all tags)
- abbAcrExpansions.invEui (the cross-ref EUI is invalid)
- abbAcrExpansions.hasEui (no cross-ref EUI, but, expansion matches EUIs)
- abbAcrExpansions.rpt (summary report)
- abbAcrExpansions.data.cand (candidate list)
=> manual copy to ./Cand/abbAcrExpansions.data.cand.${YEAR}
=> Link to ./Stats/abbAcrExpansions.data.cand.${YEAR}
=> for the first time, go to step 10 to gen candidate list
=> then, repeat steps 0-2 until abbAcrExpansions.data.cand is empty (0)
|
|
2 | Split invalid cross-ref EUI and no cross-ref EUI matches EUI file
| - abbAcrExpansions.data.invEui
- abbAcrExpansions.data.hasEui
| - abbAcrExpansions.data.invEui.NO_EUI
=> Sent to linguist to tag [D]
- [D]: if the CR of expansion is a deleted record (invalid LMWs), cross-ref EUI should be manually removed.
- Others: the expansion is a valid LMW, this case might require to change the epxasion to citation form, restore the deleted records, or create a new lexRecord, and modify the CR-EUI, etc..
=> update ${LEX_CHECK}/data/File/notBaseForm.data.${YEAR}
- this file should be empty after the update (notBaseForm.data)
- abbAcrExpansions.data.invEui.WRONG_CIT
=> wrong citation, after fixed, it should be empty
- abbAcrExpansions.data.hasEui.E
=> Exceptions, expansion has 1 matched EUI
=> Send to linguist to tag:
- [C]: correct, expansion is invalid LMW, they should not have CR-ref EUI. No fix in LB.
- [Y]: if the suggesting matched EUI is correct, manually add EUI to the lexRecord in LB.
- [- EUI: E0xxxxxxx]: expansion is a valid LMW, add the EUI to the end of line if suggesting matched EUI is not correct. Also, fix in the LB.
- abbAcrExpansions.data.hasEui.M
=> Exceptions, expansion has multiple matched EUIs
=> Sent to linguist to tag:
- [C]: correct, the expansion shold not have cross-ref EUI (even the
spelling is a valid base.=> add to abbAcrExpansions.data.hasEui.Exception.${YEAR}
- [Y]: if the 1 matched EUI is correct (need to update the Lexicon in LExBuild)
- EUI: add the correct EUI, might need to update the corss-ref EUI, modify the expansion, or add a new record (if expansion is a LMW) to Lexicon
|
|
Post-Process:
|
10 | Auto-tag candidate listCandidateUtil.FilterTagCandFile
| - ${STATS_DIR}/abbAcrExpansions.data.cand.${YEAR}
- ${LMW_DIR}/data/Candidates/0.LexiconInflVars/inflVars.data.current (valid LMWs)
=> ${LMW_DIR}/data/Candidates/0.LexiconInflVars/inflVars.data.current.1.uSort
- ${LMW_DIR}/data/Candidates/totalTerms.all.base.no (invalid LMWs)
=> generated from step-0 (00.CandidateList, steps 1-4)
|
Dir: ./Stats:
- abbAcrExpansions.data.cand.${YEAR}.autoTag (all tags)
- abbAcrExpansions.data.cand.${YEAR}.rmYesNo
This file must be empty (wc=0) once updates/tags are completed
- abbAcrExpansions.data.cand.${YEAR}.rmYesTagNo
=> Before update, this file is used as candidate list sent to linguist
- No tag
- if the expansion is a valid LMW, add to Lexicon, add CR-EUI to the expansion
- notBaseFormUpdate.data.${YEAR}
- flds 4,2 abbAcrExpansions.data.cand.${YEAR}.rmYesTagNo.${YEAR} > notBaseFormUpdate.data.${YEAR}
- Append notBaseFormUpdate.data.${YEAR} to ${LexCheck}/data/Files/notBaseForm.data.${YEAR}
| After the candidate list is completed:- Add/Link candidates to ${Candidates}/1.LexiconAbbAcrExpansion/abbAcrExpansions.data.cand.${YEAR}
- Run 00.CandidateList, step 1-4
This step updates the valid and invalid LMW, and thus update the candidates.
- rerun step 1-2, until *.cand = 0, because candidates that are LMWs are in the Lexicon and invalid LMWs are tagged as invalid automatically (by the updated totalTerm.all.base.no from 00.CandidateList), no new candidate should be found.
|