Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

PreProcess - JDI, phase III

This page describes the automatic pre-process tasks of generating input files for JDI (Journal Descriptor Indexing). There are three phases of this pre-process for JDI:
  • Phase I:
    generate all files to Java input format from Lisp files. This set of data is tested by comparing to all Lisp files and result of file.9801 and used in tc2006.
  • Phase II:
    use Java programs to generate files from original data (MEDLINE) and Lisp files. This set of data is tested by comparing to all files in phase I and results of file.9801 and used in tc2007.
  • Phase III:
    use Java program to generate files from scratch (MEDLINE, Meta-thesaurus, etc.). This set of data is tested by comparing final files in phase II by similarity (test suite) and used since tc2008.

The detailed procedures of phase III approach are described as below: