Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Real-word Correction
This page describes the algorithm for real-word correction. In general, detection and correction for real-word errors in CSpell is computed on the fly, based on context score, word frequency score, and other heuristic rules. No confusion set or assumption on the number of real-word errors were used.
I. Functions
II. Results on the Training Set
Tested different methods on the real-word included gold standard from the training set.
Methods | Raw data | Performance |
---|---|---|
Ensemble (Use Non-Word on Real-Word) | 556|825|964 | 0.6739|0.5768|0.6216 |
Ensemble (Real-Word) | 517|718|964 | 0.7201|0.5363|0.6147 |
CSpell: NW | 609|731|964 | 0.8331|0.6317|0.7186 |
CSpell: NW + RW_Merge | 619|742|964 | 0.8342|0.6421|0.7257 |
CSpell: NW + RW_Split | 611|737|964 | 0.8290|0.6338|0.7184 |
CSpell: NW + RW_1To1 | 614|740|964 | 0.8297|0.6369|0.7207 |
CSpell: NW + RW_Merge + RW_Split | 621|747|964 | 0.8313|0.6442|0.7259 |
CSpell: NW + RW_Merge + RW_Split + RW_1To1 | 626|756|964 | 0.8280|0.6494|0.7279 |
III. Examples
ID | Input | Output | Notes |
---|---|---|---|
M-1 | on set | on set | No merge |
M-2 | based on set criteria | based on set criteria | No merge |
M-3 | early on set | early onset | Merged |
M-4 | on set dementia | onset dementia | Merged |
M-5 | dianosed early on set deminita | diagnosed early onset dementia | Merged with other NW corrections |
ID | Input | Output | Notes |
---|---|---|---|
S-1 | along | along | No Split |
S-2 | for along time | for a long time | Split |
S-3 | He is along | He is along | No split |
S-4 | He is a long with me | He is along with me | No split - Merge |
ID | Input | Output |
---|---|---|
1-1 | foul small | foul smell |
1-2 | bad small | bad smell |
1-3 | small an odor | smell an odor |
1-4 | sense of small | sense of smell |
1-5 | taste and small | taste and smell |
1-6 | smell size | small size |
1-7 | smell amount | small amount |
1-8 | a smell sip of water | a small sip of water |
1-9 | smell intestine | small intestine |
1-10 | very smell | very small |
1-11 | relatively smell | relatively small |