How to Develop Prediction Models Using Electronic Medical Records

Speaker:  Uri Kartoun – Cambridge, MA, United States
Topic(s):  Information Systems, Search, Information Retrieval, Database Systems, Data Mining, Data Science


Recent remarkable advancements in computer hardware and software and the growing accessibility of electronic medical records (EMRs) have accelerated research on predicting patient outcomes. Such advances have allowed the rapid development of massive-scale predictive models—powerful resources to study disease complications at the population level. Such models have proven highly useful for discovering or confirming disease correlations, subcategories of diseases, and adverse drug events. In addition to structured data (e.g., medication prescriptions and diagnosis/procedure codes), EMRs also contain textual narrative data such as physician notes. The ability to reliably extract mentions of clinical and behavioral disease concepts and measurements from narrative notes can add important details to individuals’ clinical contexts and disease burdens.
As a research fellow at Massachusetts General Hospital (2013–2016) and a research staff member at IBM Research (2016–present), I have had the opportunity to develop prediction models by applying machine-learning algorithms using large collections of structured and unstructured datasets. In my lecture, I will go through how to develop prediction models using longitudinal EMRs. I will also cover how to develop new risk scores and how to validate them.
As a use case, I will go through the steps we took to create the MELD-Plus risk score [1]. I had the pleasure of introducing MELD-Plus for the first time at the Liver Meeting in Washington, DC, in October 2017, where I proposed its potential to replace outdated methods. If adopted by the United Network for Organ Sharing, I believe that MELD-Plus will extend the lives of many patients suffering from end-stage liver disease.
1. Kartoun, U., Corey, K., Simon, T., Zheng, H., Aggarwal, R., Ng, K., Shaw, S. The MELD-Plus: A generalizable prediction risk score in cirrhosis. PLOS ONE 12, 10 (Oct. 2017). 

About this Lecture

Number of Slides:  30
Duration:  45 minutes
Languages Available:  English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.