You are here
Gene Terms and English Words: An Ambiguous Mix
Continuing technical advances have made it possible for large-scale genetic analysis of experiments where data for thousands of genes can be produced at a time. Recognizing gene terms in biomedical text is crucially important for applications of higher level information. There are however many challenges associated with this task. One difficult aspect is negotiating the various kinds of ambiguity in gene and protein nomenclature. In this research we look at one of the most challenging kinds in which gene terms are also common English words. For example, TRAP, ART, ACT are all gene symbols that also have English meanings. This kind of ambiguity makes retrieval of relevant information more difficult. We describe IR-based ranking methods applied to document sets retrieved for ambiguous gene terms in LocusLink and present our results. We fing that using summary and product information from LocusLink records in addition to the gene term performs the best in terms of re-ranking the retrieved documents.