Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Ending Digit Splitter

  • Description:
    This splitter is used to process a split by adding a space before the ending digits if a token ends with digits.

  • Features:
    Split a token in front of ending digits.

  • Examples:

    File NameInputOutput
    26.txtquestions.1)questions. 1)
    26.txthereditary2)hereditary 2)
    26.txtdisease3)disease 3)
    14849.txtshuntfrom2007.shuntfrom 2007.
    73.txtjk5jk 5

  • Implementation Logic:
    • Converts input word to coreTerm by strip off leading and ending punctuation and spaces.
    • Check if the coreTerm ends with digit, if yes
      • Check if the coreTerm matches the exceptions, if not:
        • Add space before the ending digit(s)
    • Converts the updated coreTerm back to output term

  • Notes:
    • Baseline source code: PreProcSplit.java
    • Enhancement: not used dictionary
    • Action: Redesign and implemented
    • Apply the non-dictionary splitter model with matchers/filters by utilizing regular expression. They are described in the following table:

      Matchers
      MatcherRegular ExpressionExamples
      Ends with digit(s)^(.*)([a-zA-Z\\.]+)(\\d+)$
      • disease3
      • 100.1
      • Co-Q10

      Filters (Exceptions)
      Filter (Exception)Regular ExpressionExamples
      1. [Upper]+ before ending digit^([A-Z]+)(\\d+)$
      • A1
      • UPD14
      • CAD106
      • A2780
      2. [char]+, [-], [char]+ before ending digit^([a-zA-Z]+)-([a-zA-Z]+)(\\d+)$
      • NCI-H460
      • CCRF-HSB2
      • Co-Q10
      • saframycin-Yd2
      3. [Greek alphabet] before ending digit^(.*)(alpha|beta|gamma|delta|epsilon)(\\d)$
      • alpha1
      • beta2
      • gamma2
      • epsilon4
      4. [char] before ending digit^([a-zA-Z])(\\d+)$
      • c7
      • A1

  • Source Code: EndingDigitSplitter.java