Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Non-dictionary-based Corrections
This is the first step for spelling correction. It is used to correct errors that does not need dictionary. The non-dictionary-based correction model includes handlers and splitters. They were arranged as a chain of intermediate operator to handle HTML/XML tags introduced by the software that consumer use to ask questions, informal expression. It also handle missing spaces on adjacent punctuation or digits. Pattern match (regular expression) and table lookup are used in this type of correction. Software components are developed to resolve these issues and detailed as follows:
Types of Splitter | Error | Correction | File Name |
---|---|---|---|
Leading Digit Splitter | 20years | 20 years | 10349 |
Ending Digit Splitter | disease3 | disease 3 | 26 |
Leading Punctuation Splitter | volunteers( | volunteers ( | 12353 |
Ending Punctuation Splitter | cancer?if | cancer? if | 10004 |
File Name | Error | Correction |
---|---|---|
14 | knowabout | know about |
26 | diseaseany | disease any |
11841 | Iam | I am |
11186 | tbinthe | tb in the |
14849 | shuntfrom | shunt from |
10349 | along | a long |