CSpell

Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests on Context Window Size

I. Test Setup

Data: Training Set
Gold Standard: non-word only
Dictionary: CSpell (Lexicon-based)
Corpus: Consumer health corpus
Ranking: Context score and CSpell ranking

II. Test Results

Tests on various context window sizes in context score ranking

Context Radius	Precision	Recall	F1
1	0.7780	0.6111	0.6845
2	0.8035	0.5917	0.6815
3	0.8044	0.5685	0.6662
4	0.8156	0.5543	0.6600
5	0.8252	0.5491	0.6594
6	0.8281	0.5413	0.6547
7	0.8240	0.5323	0.6468
8	0.8320	0.5310	0.6483
9	0.8443	0.5323	0.6529
10	0.8374	0.5258	0.6460
25	0.8433	0.5078	0.6339
50	0.8442	0.5039	0.6311
100	0.8442	0.5039	0.6311

Tests on various context window sizes in CSpell score ranking

Context Radius	Precision	Recall	F1
1	0.8380	0.7817	0.8088
2	0.8407	0.7842	0.8115
3	0.8366	0.7804	0.8075
4	0.8352	0.7791	0.8061
5	0.8352	0.7791	0.8061
6	0.8296	0.7739	0.8008
7	0.8310	0.7752	0.8021
8	0.8310	0.7752	0.8021
9	0.8310	0.7752	0.8021
10	0.8296	0.7739	0.8008
25	0.8283	0.7726	0.7995
50	0.8283	0.7726	0.7995
100	0.8283	0.7726	0.7995

III. Discussion

Closer (local) context is more important than far away (global) context
The far (global) context does not contribute too much on context score
The radius of context should be equivalent to window size in the training set. Training window size = (2 * context radius + 1).
Chose radius of 2 (total window size of 5) because it has the best F1 score in CSpell ranking