Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Informal Expression Handler: Correct Informal Expression

  • Description:
    This class is used to correct informal expression by adding an apostrophe (') to the right position or just simply converted to the formal expression. For example, [whos] -> [who's] and [plz] -> [please]. In general, the corrected formal expression can't be found in the dictionary (who's) or with large edit-distance because they are not typo or spelling error. Instead, they are shorthand (e.g. [plz] is a shorthand of [please] with edit-distance of 5).

  • Features:
    • Convert the informal expression to formal expression.
      • Contraction: From baseline (original from Wikipedia)
      • Shorthand: [pls] -> [please], [u] -> [you], etc.
    • A configurable flat file driven conversion.

  • Examples:

    File NameInputOutput
    10138.txtuyou
    10679.txtbbe
    11186.txt?pls? please
    10.txtimi'm
    11186.txtdidntdidn't
    16481.txtshesshe's

  • Implementation Logic:
    • Read in the conversion from a configurable flat file and store in a local HashMap with key as informal expression and the value as the corrected expression.
    • The conversion file (./data/Misc/informalExpression.data):

      informal expressioncorrect expression
    • Tokenize input Text to input word
    • Lower case the input word
    • Go through all keys
      • if the input word is the key, replaced with corrected expression.

  • Notes:
    • Baseline source code: PreProcContractions.java
    • Enhance: read in data from a configurable flat file (not hard-coded)
    • Action: Redesign and implemented
    • Negation might be able to corrected by dictionary (e.g.: isnt -> isn't)

  • Source Code: InformalExpHandler.java