Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests - Ensemble on Training Set

I. Introduction

Performance tests are conducted on different ranking methods of Ensemble Spelling (original code).

II. Setup

  • Program:
    ${C_SPELL}/SpellCorrection/bin/runSpellingAllData
    0 (all data)
    3, 4 (nonword, real-word)
    0,1,2,3,4 (methods)
  • InData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/AllData/
  • OutData:
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData/LinearWeighted_nw_OUT_*
    ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData/LinearWeighted_rw_OUT_*

    Backup on:

    • ${C_SPELL}/SpellCorrection/CHQA_SpellCorrection_Dataset/ResultAllData.baseline
    • ${C_SPELL}/PostProcess/data/Test/Baseline/TestData/9_Baseline/Offical/*

III. Performance Results

  • Non-word Only

    MethodsOriginal GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    0. PreProcess 289|347|814
    0.8329|0.3550|0.4978
    289|347|774
    0.8329|0.3734|0.5156
    1. Orthographic 495|824|814
    0.6007|0.6081|0.6044
    511|824|774
    0.6201|0.6602|0.6395
    2. Corpus Frequency 361|810|814
    0.4457|0.4435|0.4446
    366|810|774
    0.4519|0.4729|0.4621
    3. Word Embedding 350|807|814
    0.4337|0.4300|0.4318
    358|807|774
    0.4436|0.4625|0.4529
    4. Ensemble 530|825|814
    0.6424|0.6511|0.6467
    552|825|774
    0.6691|0.7132|0.6904

  • Read-word Included (Use Ensemble option works for Real-word)

    MethodsOriginal GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Revised GoldStd
    TP|Ret|Rel
    Precision|Recall|F1
    Ensemble (non-word) 531|825|926
    0.6436|0.5734|0.6065
    556|825|964
    0.6739|0.5768|0.6216
    Ensemble (real-Word) 498|718|926
    0.6936|0.5378|0.6058
    517|718|964
    0.7201|0.5363|0.6147