Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests for Ensemble

I. Introduction

Performance tests on the test set are conducted on the Ensemble Spelling as the baseline to compare to CSpell.

II. Setup

  • Program:
    ${C_SPELL}/SpellCorrection/bin/runSpellingAllData
    4 (CSpell data - NER)
    3, 4 (nonword, real-word)
    4 (methods)
  • InData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/
  • OutData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/LinearWeighted_nw_OUT_4
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData/LinearWeighted_rw_OUT_4

    Backup on:

    • ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultCSpellData.baseline
    • ${C_SPELL}/PostProcess/data/Test/NewTest/TestData/9_Baseline/Offical/*

III. Performance Results

  • Non-word Only GoldStd

    Methods Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    4. Ensemble 559|966|974
    0.5787|0.5739|0.5763

  • Read-word Included GoldStd(Only Ensemble option works for Real-Word)

    MethodsRevised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    4. Ensemble (NW) 560|966|1178
    0.5797|0.4754|0.5224
    4. Ensemble (RW) 520|810|1178
    0.6420|0.4414|0.5231

The results of non-word and real-word options from Ensemble seems do not have too much difference.