Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests on Training Set

I. Test Setup

  • Data: Training Set
  • The corrected data of ESpell and Jazzy from Dr. Kilicoglu are used directly for this test result.
  • The Ensemble program from Dr. Kilicoglu was enhanced from Ensemble paper. Thus, the result is slightly better.

II. Test Results

  • Non-word Only:

    Non-word, Detection
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell39578537911807740.33470.51030.4043
    Jazzy324694503937740.82440.41860.5553
    Ensemble6551701198257740.79390.84630.8193
    CSpell667551077227740.92380.86180.8917

    Non-word, Correction
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell23794353711807740.20080.30620.2426
    Jazzy1872065873937740.47580.24160.3205
    Ensemble5522732228257740.66910.71320.6904
    CSpell6071151677227740.84070.78420.8115

  • Real-word Included:

    Real-word Included, Detection
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell41077055411809640.34750.42530.3825
    Jazzy334596303939640.84990.34650.4923
    Ensemble5801383847189640.80780.60170.6897
    CSpell692532727459640.92890.71780.8098

    Real-word Included, Correction
    MethodTPFPFNT. RetT. RelPrecisionRecallF1
    ESpell24593571911809640.20760.25410.2285
    Jazzy1912027733939640.48600.19810.2815
    Ensemble5172014477189640.72010.53630.6147
    CSpell6271183377459640.84160.65040.7338

  • Speed:
    • Elapse: 56.91 sec

III. Discussion

  • The Ensemble outperformed ESpell and Jazzy (ASpell) by a large margin (over 30%) because Ensemble was developed to correct errors in consumer health questions.
  • The improvement from Ensemble to CSpell for non-word detection and correction is 7.24% and 12.11%.
  • The improvement from Ensemble to CSpell for real-word detection and correction is 12.01% and 11.91%.