Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov
Ensemble Performance
This page describes the initial performance tests on the Ensemble method (from Dr. Halil).
The Source code of Ensemble Spelling Correction that is used as baseline (for developing and comparison) is slightly better than what was reported in the paper due to following reasons.
The results of 472 files are listed in the following tables (tested on lexdev):
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 289 | 58 | 525 | 347 | 814 | 0.8329 | 0.3550 | 0.4978 | 87 min. |
Non-word | W/ Orthographic similarity | 495 | 329 | 319 | 824 | 814 | 0.6007 | 0.6081 | 0.6044 | 82 Min. |
Non-word | W/ Corpus Frequency | 361 | 449 | 453 | 810 | 814 | 0.4457 | 0.4435 | 0.4446 | 83 min. |
Non-word | W/ Context Similarity | 350 | 457 | 464 | 807 | 814 | 0.4337 | 0.4300 | 0.4318 | 80 min. |
Non-word | All (Ensemble) | 531 | 294 | 283 | 825 | 814 | 0.6436 | 0.6523 | 0.6480 | 80 min. |
Real-word | All (Ensemble) |
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 221 | 53 | 416 | 274 | 637 | 0.8066 | 0.3469 | 0.4852 | 80 min. |
Non-word | W/ Orthographic similarity | 388 | 267 | 249 | 655 | 637 | 0.5924 | 0.6091 | 0.6006 | 71 Min. |
Non-word | W/ Corpus Frequency | 278 | 363 | 359 | 641 | 637 | 0.4337 | 0.4364 | 0.4351 | 72 min. |
Non-word | W/ Context Similarity | 268 | 371 | 369 | 639 | 637 | 0.4194 | 0.4207 | 0.4201 | 70 min. |
Non-word | All (Ensemble) | 413 | 243 | 224 | 656 | 637 | 0.6296 | 0.6484 | 0.6388 | 70 min. |
Real-word | All (Ensemble) |
Type | Option | TP | FP | FN | Retrieved | Relevant | Precision | Recall | F-1 | RunTime |
---|---|---|---|---|---|---|---|---|---|---|
Non-word | PreProcess Only | 68 | 5 | 109 | 73 | 177 | 0.9315 | 0.3842 | 0.5440 | 10 min. |
Non-word | W/ Orthographic similarity | 107 | 62 | 70 | 169 | 177 | 0.6331 | 0.6045 | 0.6185 | 10 Min. |
Non-word | W/ Corpus Frequency | 83 | 86 | 94 | 169 | 177 | 0.4911 | 0.4689 | 0.4798 | 10 min. |
Non-word | W/ Context Similarity | 83(82) | 85(86) | 94(95) | 168 | 177 | 0.4940 | 0.4689 | 0.4812 | 10 min. |
Non-word | All (Ensemble) | 117 | 52 | 60 | 169 | 177 | 0.6923 | 0.6610 | 0.6763 | 10 min. |
Real-word | All (Ensemble) |