Text Categorization

Text Categorization (TC), Java, UTF-8, 2011 Release:

04/11/2011

The TC (Text Categorization) project provides tools for high-level categorization based on the JDI (Journal Descriptor Indexing) methodology. JDI tools automatically categorize biomedical text as input, returning a ranked list, with scores between 0-1, of either JDs (Journal Descriptors, corresponding to biomedical disciplines) or STs (UMLS® Semantic Types). A ST based WSD (word sense disambiguation), StWsd, is added into TC package since 2009. Applications include categorization by JD as pre-processing of text for NLP (natural language processing) and WSD (word sense disambiguation) according to ST.

Description of the JDI methodology provides further details.

JDI tools are based on research in the JDI project where the tools were originally developed in Lisp.

The tools have since been developed in JAVA as part of the TC project for public interactive use and distribution as an open-source package.

  • Requirements:
    • Java V 1.6.0.21
    • Min. Required Disk Space: 12.0 GB

  • Documentation: