LHNCBC - LHNCBC Abstract

Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation.

Jimeno-Yepes A, Aronson AR

BioSEPLN 2010, Sept. 2010.

Abstract:

Manually annotated data is expensive, so manually covering a large terminological resource like the UMLS Metathesaurus is infeasible. In this paper, we evaluate two approaches used to improve the quality of an automatically extracted corpus to train statistical learners to perform WSD. The first one contributes to more specific terms while the second filters out false positives. Using both approaches, we have obtained an improvement on the original automatic extracted corpus of approximately 6% in F-measure and 8% in recall.

Jimeno-Yepes A, Aronson AR. Improving an automatically extracted corpus for UMLS Metathesaurus word sense disambiguation.
BioSEPLN 2010, Sept. 2010.

PDF