Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Strip Punctuation

  • Short Description: Strip punctuation.

  • Full Description:

    This flow is used to strips punctuation from the input term. The stripped items are not replaced by spaces. Punctuations are defined in Java Character class and include:

    • DASH_PUNCTUATION (20): -
    • START_PUNCTUATION (21): ( { [
    • END_PUNCTUATION (22): ) } ]
    • CONNECTOR_PUNCTUATION (23): _
    • OTHER_PUNCTUATION (24): ! @ # % & * \ : ; " ' , . ? /
    • MATH_SYMBOL (25): ~ + = | < >
    • CURRENCY_SYMBOL (26): $
    • MODIFIER_SYMBOL (27): ` ^

    No effect on the -m option. "none" is added at the end of the output.

  • Difference:
    1. Java version trims output terms (remove spaces at the beginning and ending of the term).
    2. Different result for testing diacritics, such as \345\346... in the unit test.


  • Features:
    1. Strip a character from the input term if the character belongs to above list.


  • Symbol: p

  • Examples:
    
    shell> lvg -f:p
    St. John's
    St. John's|St Johns|2047|16777215|p|1|
    
    More examples

  • Implementation Logic:
    1. Go through every character in the input term; strip it if the character is a punctuation.

  • Source Code: ToStripPunctuation.java

  • Hierarchy: Object -> Transformation -> ToStripPunctuation