The SPECIALIST Lexicon

Phonetic Exceptions - Heuristic Rules

Introduction

Phonetic algorithms of Double Metaphone and Caverphone 2.0 are used to identify if terms have same pronunciations. However, the precision of these algorithms are not 100%. Heuristic rules are retrieved from Lexicon.2015 to correct these exceptions (false positives). They are described as follows:

  • This list was developed to retrieve inflectional spelling variants from Lexicon.2015
  • Terms matches the IRREG patterns are retrieved. They are terms have same EUI, POS, inflections, same phonetic codes, etc.
  • They are send to linguists to tag valid/invalid spVars
  • Over 100 heuristic rules are derived base on the tagging results

    src-suffixtar-suffixFlag

PhoneticExceptionPattern.java

  • These heuristic rules are read in and loaded into a Map:
    • String: src-suffix|tar-suffix
    • PhoneticExceptionObj (src-suffix|tar-suffix|flag)
  • Check all input pairs on both forward and backward directions and assign flag:
    • IRREG_NO: invalid (different pronunciation)
    • IRREG_YES: valid (same pronunciation)
    • IRREG_TBD: not covered in the current heuristic rules, need to add in