Phonetic Exceptions - Heuristic Rules
Introduction
Phonetic algorithms of Double Metaphone and Caverphone 2.0 are used to identify if terms have same pronunciations. However, the precision of these algorithms are not 100%. Heuristic rules are retrieved from Lexicon.2015 to correct these exceptions (false positives). They are described as follows:
- This list was developed to retrieve inflectional spelling variants from Lexicon.2015
- Terms matches the IRREG patterns are retrieved. They are terms have same EUI, POS, inflections, same phonetic codes, etc.
- They are send to linguists to tag valid/invalid spVars
- Over 100 heuristic rules are derived base on the tagging results
PhoneticExceptionPattern.java
- These heuristic rules are read in and loaded into a Map:
- String: src-suffix|tar-suffix
- PhoneticExceptionObj (src-suffix|tar-suffix|flag)
- Check all input pairs on both forward and backward directions and assign flag:
- IRREG_NO: invalid (different pronunciation)
- IRREG_YES: valid (same pronunciation)
- IRREG_TBD: not covered in the current heuristic rules, need to add in