Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Dictionary Functions - Check Valid Word
I. Introduction
In cSpell, all tokens that are used for spelling error detection are single words. Thus, only single words are needed to be in the dictionary. This page described which dictionary should be used for the spelling erroor detection.
II. Algorithm
Both the whole token and core-term for the token are checked for the valid spellina (Is-Valid-Word):
III. Results
Test cSpell with different dictionaries:
* noAaDic: En + Pn
eng_medical.dic:
Lexicon.dic:
IV. Tests:
Test-1: Tests on Baseline + Lexicon (not used, result are included from above)
Dictionary | TP|Ret|Rel | Precision | Recall | F1 | Notes |
---|---|---|---|---|---|
Lexicon (single-word + multiwords) | |||||
| 535|858|814 | 0.6235 | 0.6572 | 0.6400 | |
| 530|877|814 | 0.6043 | 0.6511 | 0.6268 | |
Lexicon (single-words) | |||||
| 531|808|814 | 0.6572 | 0.6523 | 0.6547 | |
| 535|858|814 | 0.6235 | 0.6572 | 0.6400 | |
| 530|877|814 | 0.6043 | 0.6511 | 0.6268 | |
Combined (10 spVar are included in Lexicon) | |||||
| 529|740|814 | 0.7149 | 0.6499 | 0.6808 | |
| 533|745|814 | 0.7154 | 0.6548 | 0.6838 | |
| 537|745|814 | 0.7208 | 0.6597 | 0.6889 | |
TBD | |||||
| 549|745|814 | 0.7369 | 0.6744 | 0.7043 |
Test 2: Tests on split Dictionaries
Dictionary | TP|Ret|Rel | Precision | Recall | F1 | Notes | ||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Use Baseline Dictionary for check and suggest | |||||||||||||||||||||||||||||||||||||||||||||||
| 546|820|814 | 0.6659 | 0.6708 | 0.6683 | |||||||||||||||||||||||||||||||||||||||||||
| 547|810|814 | 0.6753 | 0.6720 | 0.6736 | Add 10 files for spVars | ||||||||||||||||||||||||||||||||||||||||||
| 548|765|814 | 0.7163 | 0.6732 | 0.6941 | Check proper noun from Lexicon | ||||||||||||||||||||||||||||||||||||||||||
| 547|804|814 | 0.6803 | 0.6720 | 0.6761 | Check Abb/Acr from Lexicon | ||||||||||||||||||||||||||||||||||||||||||
| 548|759|814 | 0.7220 | 0.6732 | 0.6968 | Check proper nouns/Abb/Acr from Lexicon | ||||||||||||||||||||||||||||||||||||||||||
| 544|747|814 | 0.7282 | 0.6683 | 0.6970 | Add SpVar from Lexicon | ||||||||||||||||||||||||||||||||||||||||||
| 543|746|814 | 0.7279 | 0.6671 | 0.6962 | Replace 10 files by Lexicon.spVar | ||||||||||||||||||||||||||||||||||||||||||
| 543|749|814 | 0.7250 | 0.6671 | 0.6948 | Add SpVar to dic decreases F1 because it us also used for suggestion (need a better ranking system) | ||||||||||||||||||||||||||||||||||||||||||
| 543|746|814 | 0.7279 | 0.6671 | 0.6962 | Add number, no change bz of data | ||||||||||||||||||||||||||||||||||||||||||
Implement 2 Dictionaries: Check + Suggest | |||||||||||||||||||||||||||||||||||||||||||||||
Find the Check dictionary | |||||||||||||||||||||||||||||||||||||||||||||||
|
Test-3: test on the suggestion dictionary
Dictionary | TP|Ret|Rel | Precision | Recall | F1 | Notes | ||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Find the Suggest dictionary | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|