Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

Performance Tests on Orthographic Similarity Score

I. Test Setup

  • Data: Training Set
  • Gold Standard: non-word only
  • Dictionary: CSpell (Lexicon-based)
  • Corpus: none
  • Ranking: Orthographic ranking

II. Test Results

  • Tests on token (edit distance), phonetic, and overlap similarity scores:
    IDRankingPrecisionRecallF1
    0-1Edit Distance0.76060.76360.7621
    0-2Phonetic0.74900.75190.7505
    0-3Overlap0.75420.75710.7556

  • Tests on orthographic similarity scores using various weighting factors (WF) of token (edit distance), phonetic, and overlap similarity scores:

    IDEdit DistancePhoneticOverlapPrecisionRecallF1Notes
    11.001.001.000.75800.76100.7595same ratio of WF
    20.950.950.950.75800.76100.7595
    30.900.900.900.75800.76100.7595
    41.000.900.900.75930.76230.7608Increase 1 WF
    50.901.000.900.75800.76100.7595
    60.900.901.000.75800.76100.7595
    70.800.900.900.75800.76100.7595Decrease 1 WF
    80.900.800.900.75930.76230.7608
    90.900.900.800.75800.76100.7595
    101.000.800.900.75930.76230.7608Try and error by increasing Edit distance, decreasing phonetic
    111.000.800.850.75930.76230.7608
    121.000.700.800.76060.76360.7621
    131.000.700.900.75930.76230.7608
    141.000.000.000.76060.76360.7621
    151.000.500.800.76060.76360.7621
    161.000.600.800.76060.76360.7621
    171.000.650.800.76060.76360.7621
    181.000.650.900.76060.76360.7621
    191.000.650.900.76060.76360.7621
    201.000.750.900.75930.76230.7608
    211.000.850.900.75930.76230.7608

    III. Discussion

    • From the test 0-1-0-3, the order of better ranking in orthographic is Edit-distance, overlap, phonetic
    • The result of tests 1-3 are the same. That is the same ratio of weighting factors leads to same results
    • From the results of test 4-6, we observed the higher the weighting factor of edit distance similarity score, the better the F1 score.
    • From the results of test 7-9, we observed the lower the weighting factor of phonetic similarity score, the better the F1 score.
    • Find the best F1 by try and error on tests 10-21