Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Non-word Correction
This page describes the algorithm for non-word correction.
I. Functions
II. Results on Training Set
Tests CSpell ranking mode on the development set for non-word with different function modes:
Function Mode | Raw data | Performance |
---|---|---|
ESpell | 230|1180|774 | 0.1949|0.2972|0.2354 |
Jazzy (ASpell) | 186|393|774 | 0.4733|0.2403|0.3188 |
Ensemble | 552|825|774 | 0.6691|0.7132|0.6904 |
CSpell, non-dictionary-based | ||
non-dictionary-based | 340|373|774 | 0.9115|0.4393|0.5929 |
CSpell, non-word, Single Function | ||
1-to-1 | 588|699|774 | 0.8412|0.7597|0.7984 |
Split | 365|469|774 | 0.7783|0.4716|0.5873 |
Merge | 343|382|774 | 0.8979|0.4432|0.5934 |
CSpell, non-word, Combined Functions | ||
1-to-1 + Split | 603|724|774 | 0.8329|0.7791|0.8051 |
1-to-1 + Split + Merge | 606|731|774 | 0.8290|0.7829|0.8053 |
From the results:
III. Examples
ID | Input | Output | Notes |
---|---|---|---|
ND-1 | "Good" | "Good" | Xml/Html handler |
ND-2 | pls | please | Informal Expression handler |
ND-3 | 20years | 20 years | Leading Digit Splitter |
ND-4 | from2007 | from 2007 | Ending Digit Splitter |
ND-5 | volunteers(healthy) | volunteers (healthy) | Leading Punctuation Splitter |
ND-6 | pain.help! | pain. help! | Ending Punctuation Splitter |
ND-7 | pain.pls help! | pain. please help! | Combo |
ND-8 | visit at pain.com! | visit at pain.com! | No correction! |
ID | Input | Output | Notes |
---|---|---|---|
M-1 | dur ing | during | Merge |
M-2 | non drug | nondrug | Merge |
M-3 | non protein | non-protein | Merge with hyphen |
M-4 | non surgical | non surgical | No merge |
multiword | Element-non-word |
---|---|
non surgical | non |
in vitro | vitro |
in vivo grown | vivo |
intra articular route | intra |
per se | se |
ID | Input | Output |
---|---|---|
1-1 | good diagnosised | good diagnosis |
1-2 | was diagnosised with | was diagnosed with |
ID | Input | Output |
---|---|---|
S-1 | thankyou | thank you |
S-2 | shuntfrom2007.how | shunt from 2007. how |