You are here
Naive Bayes Classifier for Extracting Bibliographic Information From Biomedical Online Articles
A Naive Bayes classifier has been developed to extract grant numbers, a key piece of bibliographic information, from online, HTML-formatted, biomedical articles for the National Library of Medicine's MEDLINE database. Grant numbers identify research support from funding organizations, and are part of the MEDLINE citations. 47,362 sentences are collected from articles cited in the MEDLINE database to train and test the classifier, and 4,721 words are identified as suitable features for classification. Experimental results are evaluated using three measures: Precision, Recall, and F-Measure, all of which exceed 98.05%.