You are here
Identification of Investigator Name Zones Using SVM Classifiers and Heuristic Rules.
The research reported in biomedical articles often involves large numbers of investigators at different institutions. To properly credit these investigators, an article's authors frequently name them together in some part of the article. These Investigator Names (IN) now constitute a required field in the MEDLINE® citation for the article. The automated extraction of these names is implemented in a system developed by a research group at the U.S. National Library of Medicine, consisting of three modules based on Support Vector Machine (SVM) classifiers and heuristic rules. The SVM classifiers label text blocks ("zones") that possibly contain Investigator Names, and the heuristic rules identify the actual zones. We collect eleven sets of word lists to train and test the classifiers, each set containing 100 to 56,000 words. Experimental results on online biomedical articles show a Precision of 0.90, 0.95 Recall, 0.92 F-Measure, and 0.99 Accuracy.