Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

XML/HTML Handler: Correct XML/HTML Entity

  • Description:
    This class is used to convert HTML/XML entity to ASCII.

  • Features:
    Convert the following HTML/XML entity.

    Inout
    &lt;<
    &gt;>
    &amp;&
    &quot;"
    &nbsp;

  • Examples:

    File NameInputOutput
    10058.txt&amp;&
    10715.txt&quot;?"?
    12190.txt&quot; why" why

  • Implementation Logic:
    • store the conversion in a local HashMap with key as XML/HTML entity and the value as the converted ASCII character.
    • go through all keys
      • if the input text contains key, replaced with converted ASCII character

  • Notes:
    • Baseline source code: PreProcXml.java
    • Bug fixes:
      • [& X] -> [&X]
      • [&....I] -> [&...I]
    • Action: Redesign and implemented
    • Do not convert all entities of [&#ddd;] to ASCII. Might need this conversion if they are in the input text.

  • Source Code: XmlHtmlHandler.java