You are here
Adapting A Monolingual Consumer Health System for Cross-Language Information Retrieval
This preliminary study applies a bilingual term list (BTL) approach to cross-language information retrieval (CLIR) in the consumer health domain and compares it to a machine translation (MT) approach. We compiled a Spanish-English BTL of 34,980 medical and general terms. We collected a training set of 466 general health queries from MedlinePlus en español and 488 domain-specific queries from ClinicalTrials.gov translated into Spanish. We submitted the training set queries in English against a test bed of 7,170 ClinicalTrials.gov English documents, and compared MT and BTL against this English monolingual standard. The BTL approach was less effective (F= 0.420) than the MT approach (F= 0.578). A failure analysis of the results led to substitution of BTL dictionary sources and the addition of rudimentary normalization of plural forms. These changes improved the CLIR effectiveness of the same training set queries (F= 0.474), and yielded comparable results for a test set of new 954 queries (F= 0.484). These results will shape our efforts to support Spanish-speakers' needs for consumer health information currently only available in English.