Generate LEXICON in pure ASCII format
This step must be completed before generate LEXICON tables because the LEXICON.release might need to modified through this step.
I. Concept: Algorithm of Generating ASCII Lexicon
II. Pre-Process: Prepare data and files
mkdir ${LEXICON_DIR}/data/${YEAR}/tables
cd ${LEXICON_DIR}/data/${YEAR}/tables
ln -sf ../data/LEXICON.release LEXICON
mkdir ${LEXICON_DIR}/data/${YEAR}/ascii
shell>cp -rp ${LEXICON}/data/${PRE_YEAR}/ascii/exceptions ${LEXICON}/data/${YEAR}/ascii/exceptions
III. Process: Generate ASCII Lexicon
shell> ${LEXICON}/bin/3.GenerateAsciiLexicon <year>
${LVG_YEAR}
${LC_YEAR}
4.ReviewAsciiReports ${YEAR}
)
E0543077|base|delete|not-Lex|divorcé|divorce|N
E0702889|base|delete|not-Lex|Pécs|Pecs|N
E0710983|base|delete|not-Lex|GΩ|GOmega|N
E0721571|base|delete|not-Lex|μB|muB|N
Log
IV. Review ASCII Reports
shell> ${LEXICON}/bin/4.ReviewAsciiReports <year>
Exception files | Description | Action |
---|---|---|
invalidAsciiExceptions.txt | invalid ASCII conversion that is deleted in line to line ASCII conversion | update |
EUI | Type|action|Reason | non-ASCII | ASCII conversion | Tag (TBD) |
EUI | Base | Action (delete) | Cause (not in Lexicon) | Citation | ASCII conversion |
Year | Notes |
---|---|
2014 | All 88 valid conversions are deleted in step 3. |
2015 | All 90 valid conversions are deleted in step 3 (93 valid exceptions). |
2016 | All 90 valid conversions are deleted in step 3 (93 valid exceptions). |
2017 | All 94 valid conversions are deleted in step 3 (97 valid exceptions). |
2018 | All 92 valid conversions are deleted in step 3 (97 valid exceptions). |
2019 | All 95 valid conversions are deleted in step 3 (100 valid exceptions). |
2020 | All 93 valid conversions are deleted in step 3 (100 valid exceptions). |
2021 | All 100 valid conversions are deleted in step 3 (107 valid exceptions). |
2022 | All 100 valid conversions are deleted in step 3 (107 valid exceptions). |
2023 | All 100 valid conversions are deleted in step 3 (107 valid exceptions). |
2024 | All 100 valid conversions are deleted in step 3 (107 valid exceptions). |
2025 | All 100 valid conversions are deleted in step 3 (107 valid exceptions). |
shell>cd /nfsvol/lex/Lu/Development/LVG/Components/Unicode/bin
shell>GetNonAsciiFromFile ${LEXICON.ascii} line char
shell> wc -l line
must be 0 (no non-ASCII Unicdoe)
V. Generate ASCII tables
shell> ${LEXICON}/bin/10.GenerateAsciiTables <year>
9
shell> ${LEXICON}/bin/10.GenerateAsciiTables <year>
10