You are here

Discoveries from Clinical Data

Large database collections of clinical data -- from longitudinal research projects, electronic medical records, and health information exchanges -- provide opportunities to examine controversial findings from smaller scale clinical studies and to conduct retrospective epidemiological studies in areas that lack clinical trials.

NLM established a goal to integrate biomedical, clinical, and public health information systems that promote scientific discovery and speed the translation of research into practice (NLM Long Range Plan, 2006-2016, Goal 3).  One of NLM's key recommendations to fulfill this goal is to "develop linked databases for discovering relationships between clinical data, genetic information, and environmental factors."

LHNCBC's biostatistician and clinicians are using MIT’s large longitudinal MIMIC-II database (33,000 patients with 40,000 intensive care unit (ICU) visits and 180 million rows of data) to answer clinical research questions. We also contributed standard clinical vocabulary code mappings to the latest MIMIC-II release (v 2.6).

We have completed a study on the impact of obesity on outcomes after critical illness, which was published in the journal Critical Care.

Ongoing studies include: 1) the relationship between vitamin B12 levels and mortality; and 2) the relationship between blood transfusions, feeds, and necrotizing enterocolitis (NEC) in newborns.

We developed and implemented Natural Language Processing algorithms to extract patients’ smoking status and discharge destinations from the MIMIC-II physician discharge summaries. We extracted information on episodes of neonatal apnea and bradycardia as well as maternal history from clinical notes for infants in the neonatal intensive care unit (NICU) for the NEC study. We also extracted data about hypertension and hypertensive medications from free-text notes, and used that data to compare to ICD-9 hypertension diagnosis codes in order to evaluate underreporting of certain common conditions after ICU admission.

To assist with integrating and analyzing the data, LHNCBC's researchers are using NLM-supported clinical vocabulary standards to improve the utility of the MIMIC-II database. We mapped the laboratory tests and medications to LOINC and RxNorm, respectively, and its radiology reports to the LOINC codes that describe the radiology study.

We are also developing the Maximum Likelihood (ML) statistical method -- to address measurement error in NLP-derived variables in order to reduce bias -- which could potentially increase the utility of NLP-derived data.

This LHNCBC research aligns closely with NIH's Big Data to Knowledge (BD2K) initiative, which "seeks to facilitate broad use of biomedical big data through new data sharing policies, catalogs of datasets, and enhanced training for early career scientists entering the new world of big data" by supporting "the management, analysis and integration of large-scale data and informatics."

Publications/Tools: 
Huser V, DeFalco FJ, Schuemie M, Ryan PB, Shang N, Velez M, Park RW, Boyce RD, Duke J, Khare R, Utidjian L, Bailey L. Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets. EGEMS (Wash DC). 2016 Nov 30;4(1):1239. doi: 10.13063/2327-9214.1239. eCollection 2016.
Jarlenski M, Baik SH, Zhang Y. Trends in Use of Medications for Smoking Cessation in Medicare, 2007-2012. Am J Prev Med. 2016 Sep;51(3):301-8. doi: 10.1016/j.amepre.2016.02.018. Epub 2016 Mar 30.
Driessen J, Baik SH, Zhang Y. Trends in Off-Label Use of Second-Generation Antipsychotics in the Medicare Population From 2006 to 2012. Psychiatr Serv. 2016 Aug 1;67(8):898-903. doi: 10.1176/appi.ps.201500316. Epub 2016 Apr 15.
Boyce RD, Voss EA, Huser V, Evans L, Reich C, Duke JD, Tatonetti NP, Lorberbaum T, Dumontier M, Hauben M, Wallberg M. LAERTES: An open scalable architecture for linking pharmacovigilance evidence sources with clinical data. Proc International Conference on Biomedical Ontology and BioCreative (ICBO BioCreative 2016). http://icbo2016.cgrb.oregonstate.edu/node/354.
Hripcsak G, Ryan PB, Duke JD, Shah NH, Park RW, Huser V, Suchard MA, Schuemie MJ, DeFalco FJ, Perotte A, Banda JM, Reich CG, Schilling LM, Matheny ME, Meeker D, Pratt N, Madigan D. Characterizing treatment pathways at scale using the OHDSI network. Proc Natl Acad Sci U S A. 2016 Jul 5;113(27):7329-36. doi: 10.1073/pnas.1510502113. Epub 2016 Jun 6.
Benedict SH, Hoffman K, Martel MK, Abernethy AP, Asher AL, Capala J, Chen RC, Chera B, Couch J, Deye J, Efstathiou JA, Ford E, Fraass BA, Gabriel PE, Huser V, Kavanagh BD, Khuntia D, Marks LB, Mayo C, McNutt T, Miller RS, Moore KL, Prior F, Roelofs E, Rosenstein BS, Sloan J, Theriault A, Vikram B. Overview of the American Society for Radiation Oncology-National Institutes of Health-American Association of Physicists in Medicine Workshop 2015: Exploring Opportunities for Radiation Oncology in the Era of Big Data. Int J Radiat Oncol Biol Phys. 2016 Jul 1;95(3):873-879. doi: 10.1016/j.ijrobp.2016.03.006.
Fung KW, Richesson R, Smerek M, Pereira KC, Green BB, Patkar A, Clowse M, Bauck A, Bodenreider O. Preparing for the ICD-10-CM Transition: Automated Methods for Translating ICD Codes in Clinical Phenotype Definitions. EGEMS (Wash DC). 2016 Apr 12;4(1):1211. doi: 10.13063/2327-9214.1211. eCollection 2016.
Hume S, Aerts J, Sarnikar S, Huser V. Current applications and future directions for the CDISC Operational Data Model standard: A methodological review. J Biomed Inform. 2016 Apr;60:352-62. doi: 10.1016/j.jbi.2016.02.016. Epub 2016 Mar 2.
Kilicoglu H, Demner-Fushman D. Bio-SCoRes: A Smorgasbord Architecture for Coreference Resolution in Biomedical Text. PLoS One. 2016 Mar 2;11(3):e0148538. doi: 10.1371/journal.pone.0148538. eCollection 2016.
Driessen J, Baik SH, Zhang Y. Explaining Improved Use of High-Risk Medications in Medicare Between 2007 and 2011. J Am Geriatr Soc. 2016 Mar;64(3):674-6. doi: 10.1111/jgs.14000.

Pages