You are here
Automated Labeling Of Biomedical Online Journal Articles
An automated labeling (AL) module has been developed to automate the extraction of bibliographic data (e.g., article title, authors, affiliation, abstract, and others) from online biomedical journals for the National Library of Medicine's MEDLINE database. The AL module employs string matching, statistics, and fuzzy rule-based algorithms to identify segmented zones in an article's HTML pages as specific bibliographic data. Experiments conducted with 1,267 medical articles from 64 journal issues show about 97.71% accuracy.