You are here

Knowledge-intensive and statistical approaches to the retrieval and annotation of genomics MEDLINE citations.

Printer-friendly versionPrinter-friendly version
Aronson AR, Demner-Fushman D, Humphrey SM, Ide NC, Kim W, Liu H, Loane RF, Mork JG, Smith LH, Tanabe LK, Wilbur WJ, Xie N
The Thirteenth Text Retrieval Conference (TREC 2004). 2004:503-11.
Abstract: 

Retrieving and annotating relevant information sources in the genomics literature are difficult but common tasks undertaken by biologists. The research presented here addresses these issues by exploring methods for retrieving MEDLINE citations that answer real biologists' information needs and by addressing the initial tasks required to annotate MEDLINE citations having genomic content with terms from the Gene Ontology (GO). We approached the retrieval task using two methods: aggressive, knowledge-intensive query expansion and text neighboring. Our approaches to the triage subtask for annotation consisted of traditional machine learning (ML) methods as well as a novel ML algorithm for thematic analysis. Finally, we used a statistical, n-gram heuristic to decide which of the GO hierarchies should be used to annotate a given MEDLINE citation.

Aronson AR, Demner-Fushman D, Humphrey SM, Ide NC, Kim W, Liu H, Loane RF, Mork JG, Smith LH, Tanabe LK, Wilbur WJ, Xie N. Knowledge-intensive and statistical approaches to the retrieval and annotation of genomics MEDLINE citations. The Thirteenth Text Retrieval Conference (TREC 2004). 2004:503-11.