Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov

CSpell

CSpell Profiling Analysis

I. Introduction

Each questions (471) in the training set were tested and recorded the elapsed time. These profiling information was analyzed for optimizing the software program performance.

II. Processes

Test on non-word only in the training set:

  • Step 1: non-word option:
    • The X-axis is the questions used in the training set
    • The Y-axis is the elapsed time of CSpell correction for each question
    • Two peaks, happen when errors has long length. Due to the candidate generating algorithm (reversed edit distance), these long length creates too many permutation and result in long elapsed time.
      IDfile namelongest error token
      30416859.txtgastreonterology-colonoscopy
      43466.txtbackwith-Wieddeman

  • Step 2: set the max. length of spelling error to 10
    • To fix the above issue, we set the max-length of error to 10 (configurable)
    • One peak, happen when possible splits is big.
      IDfile name
      15413095.txt

  • Step 3: set the max. split to 1
    • To fix the above issue, we set the max split to 1 (configurable)
    • No obvious peak found

III. Conclusion

All peaks are expected and the empirical best values are set in the default configuration file of CSpell for the best performance.