Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

TC Package - Annual Release Procedures

This page describes an annually release procedures for Text Categorization tools package with new set of training data.

  1. Prepare tc${YEAR} baseline
    • Copy tc${PREV_YEAR} to tc${YEAR}
      shell> cp -rp ${TC}/tc${PREV_YEAR} ${TC}/tc${YEAR}
    • Change ${PREV_YEAR} to ${YEAR} in build.html files under ${TC}, ${TC}/examples, ${TC}/install
    • Change ${PREV_YEAR} to ${YEAR} in ${TC}/overview.html
    • Update ${TC}/data/Config/tc.properties, tc.properties.TEMPLATE
      => Try to build with shell> ant release (should be OK to build)
  2. Update Lib/*.jar file
    • Update ${TC}/lib/Other/lvg${YEAR}api.jar
    • Update ${TC}/lvg${YEAR}lite
      => This is needed when run stWsd (unzip from lvg${YEAR}lite.tgz)
    • Update ${TC}/lib/jdbcDrivers/hsqldb.jar
  3. Update Java source code
    • Modify prolog of java files
      • Remove all SCRs-XX from history tag
      • Modify V-${PREV_YEAR} from version tag
        		shell> cd ${LVG}/Components/BaselineCode/bin
        		shell> ModifyTcJavaCode
        		shell> YYYY (${YEAR})
        		shell> 1
        		shell> y
        		

        => build and test to make sure the result is same as last release

  4. Update JDK/JRE
    • Download JDK from SUN
    • Install JDK to /usr/local/Applications/Java
    • Update symbolic link of /usr/bin/java
    • Update symbolic link of /usr/bin/javac
    • Update ${JAVA_HOME} in ~/.cshrc (for javadoc)

    • Update 2 JREs ${TC}/bin/jreDist/
      • Linux
      • windows
  5. Update Installation Program
    • Update ${project.year} in ${TC}/install/build.xml
    • Update ${TC}/install/sources/gov/nih/nlm/nls/tc/install/Setup/Param.java
      • VERSION
      • JRE_DIR
      • DATABASE_NAME
    • Update scripts in ${TC}/install/bin/*
      • TC_YEAR
      • JRE_VERSION
      • CLASSPATH
  6. Update DB
    • Download latest version of HyperSql Db, copy 3 files to ${TC}/lib/jdbcDrivers
      • hsqldb.jar
      • hsqldb_lic.txt
      • hypersonic_lic.txt
  7. Reload data to Database (if it is upgraded)
    • cd /bin/loadDb/
    • Change ${PREV_YEAR} to ${YEAR} in ${TC}/bin/loadDb/1.CreateDb
    • Load to new Database
      • Create DB
      • Load data to database
        • Word-Jd Scores
        • Mh-Jd Scores
        • Sh-Jd Scores
        • Word-St Scores
      • Change readonly=true in tc${YEAR}.properties
      • Check is the result is the same
  8. Integrate with New JDI Training Data
    • Generate JDI dataset, see TC Preprocess procedures
    • Load to new Database
      • Create DB
      • Change hsqldb.cache_file_scale=8 in tc${YEAR}.properties
      • Load data to database
        • Word-Jd Scores
        • Mh-Jd Scores
        • Sh-Jd Scores
      • Change readonly=true in tc${YEAR}.properties
    • Find and update Max. Signal
    • Test JDI similarity between ${YEAR} and ${PRE_YEAR}
  9. Integrate with New STI Training Data
    • Generate STI dataset, see TC Preprocess procedures
    • Use STRI to refine StDocument
    • Load data to database
      • Word-St Scores
    • Test STI and STRI through WSD data collection set
  10. Complete SCRs for ${YEAR} release
    • Update version ${YEAR}
      • {TC_SRC}/Tools/Jdi.java
      • {TC_SRC}/Tools/Sti.java
      • {TC_SRC}/Tools/Stri.java
      • {TC_SRC}/Tools/StWsd.java
      • {TC_SRC}/Tools/Mlt.java
    • Update default value for Max. normalized signal (from observation of file wordSignalWcDcGt1.txt)
      • MAX_SIGNAL in ${TC_SRC}/FilterApi/LegalWordsOption.java
    • Update -rv:YEAR option
      • ${TC_SRC}/Lib/TcSystemOption.java
    • Standardize Java source code
      shell> cd ${LVG}/Components/BaselineCode/bin/
      shell> ModifyTcJavaCode
      ${YEAR}
      2
  11. Update other software components in the package
    • ${TC}/bin
      • Update ${YEAR} in ${TC}/bin/runProg
      • Update SCR_NO in ${TC}/bin/genBuildInfo
    • ${TC}/data
      • Modify ROOT_DIR=AUTO_MODE in ${TC}/data/Config/tc.properties
    • ${TC}/docs
      • Modify ${YEAR} in ${TC}/docs/updateDoc
    • Example Codes
      • Update ${TC_YEAR} in {TC_EXAMPLE}/bin/runExample
  12. Installation Test
    • Update ${TC}/install/Msg/jdiGold.txt (for the new results)
      => This needs to be done after the new database is reloaded.
  13. Compile & Pack
    • shell> cd ${TC}
    • Update ${TC}/genBuildInfo

    • shell> ant clean
    • shell> ant release
    • shell> cd ..
    • shell> gtar -czvf tc${YEAR}.tgz tc${YEAR}
  14. Update web site
    • Update documents
      • change ${PRE_YEAR} to ${YEAR} on ${TC_WEB}/${YEAR}/Home/topMenu*.html
      • change ${PRE_YEAR} to ${YEAR} on ${TC_WEB}/${YEAR}/web/interactiveTools.html
  15. Test
    • Install TC.{YEAR} to ${PROJECTS}
      • shell> mv tc.{YEAR}.tgz to ${PROJECTS}/TC
      • shell> gtar -xzvf tc.{YEAR}.tgz
      • shell> cd ${PROJECTS}/TC/tc${YEAR}
      • shell> ./install/bin/install_linux
    • Add links to previous data sets
      • ln -sf /export/home/lu/Development/TC/tcData/data.2007 data.2007
      • ln -sf /export/home/lu/Development/TC/tcData/data.2008 data.2008
      • ln -sf /export/home/lu/Development/TC/tcData/data.2009 data.2009
      • ln -sf /export/home/lu/Development/TC/tcData/data.2010 data.2010
      • ln -sf ./data data.2011
    • Test all programs for all datasets
      • Test all programs: jdi, sti, stri, stWst, mlt
      • Test all versions: -rv:${YEAR}
    • Test JDI dataset similarity
    • Test STI by WSD collection data set
  16. Update Web Applications
    • Web Tools:
    • TCAT: