You are here
Syntactic-semantic frames for clinical cohort identification queries.
Large sets of electronic health record data are increasingly used in retrospective clinical studies and comparative effectiveness research. The desired patient cohort characteristics for such studies are best expressed as free text descriptions. We present a syntactic-semantic approach to structuring these descriptions. We developed the approach on 60 training topics (descriptions) and evaluated it on 35 test topics provided within the 2011 TREC Medical Record evaluation. We evaluated the accuracy of the frames as well as the modifications needed to achieve near perfect precision in identifying the top 10 eligible patients. Our automatic approach accurately captured 34 test descriptions; 25 automatic frames needed no modifications for finding eligible patients. Further evaluations of the overall average retrieval effectiveness showed that frames are not needed for simple descriptions containing one or two key terms. However, our training results suggest that the frames are needed for more complex real-life cohort selection tasks.