You are here
Syntactic-semantic question frames for cohort identification.
Large sets of electronic health record (EHR) data are increasingly used in retrospective clinical studies and comparative effectiveness research. Free text is often used to describe the desired patient cohort characteristics for such studies. We present a syntactic-semantic approach to capturing free-text cohort characteristics in a structured frame format. We generated 60 topics to develop the approach, and evaluated it on 30 IOM priority topics for comparative effectiveness research that were provided for the Medical Records evaluation at the 2011 Text Retrieval Conference. We evaluated the accuracy of the frames as well as the modifications needed to achieve near perfect precision in identifying the top 10 eligible patients. Our automatic approach accurately captured 29 test questions, of which 21 needed no modification for finding eligible patients. Overall, the syntactic–semantic frames compared favorably to keyword searches when a domain-specific search engine was used for cohort selection.