Over a million developers have joined DZone.

Using Machine Learning to Predict Outcomes for Sepsis Patients

DZone's Guide to

Using Machine Learning to Predict Outcomes for Sepsis Patients

This machine learning model can help identify well-known associations with sepsis death even among the noise of many unrelated variables.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

Sepsis is a is a life-threatening condition that arises when the body's response to an infection injures its own tissues and organs. Sepsis is a complex syndrome that is difficult to identify early, as its symptoms, such as fever and low blood pressure, overlap with those of other common illnesses. Without timely treatment, sepsis can progress to septic shock, which has a hospital mortality rate greater than 40%.

Understanding which sepsis patients are at the highest risk for death could be useful for clinicians in prioritizing care. Our team partnered with researchers from Geisinger Healthcare System to build a model to predict in-hospital or 90-day-post-discharge all-cause mortality among hospitalized sepsis patients using historic electronic healthcare record (EHR) data. This model could provide guidance to medical teams to monitor carefully and take any preventive measures possible for those patients that have a high probability prediction of death.

Data Science Environment

We used IBM Data Science Experience (see here and here): a collaborative environment with the tools needed to ingest, visualize, and build models with heterogeneous data sources, providing data scientists choice among the three most popular languages (Python, Scala, and R) and notebooks (Jupyter and Zeppelin), and including IBM value-added functionalities and data science community features. IBM Data Science Experience operationalizes models for real-time or batch scoring and consumption by business applications. It also has the capability to integrate a feedback loop for continuous model monitoring and re-training.

Gathering and Preparing Data

Geisinger provided de-identified data files on over 10,000 patients diagnosed with sepsis between 2006 and 2016. These patients were either admitted to the hospital with sepsis or acquired sepsis during hospitalization. The data included demographics, inpatient and outpatient Geisinger Healthcare System visits, surgical procedures, medical history, cultures (bacteria), medications, transfers between hospital units, social history (e.g. tobacco and alcohol use), vital measures, and lab results.

Per patient, we selected the most recent hospitalization and associated data from the various sources for that hospitalization. This included specific information on events during the hospitalization, such as the type and location of surgery and the culture location and bacteria found from cultures. We also derived summarized information on events preceding the hospitalization, e.g. the number of surgical procedures 30 days prior to hospitalization. No data after discharge was used. Figure 1 summarizes these time-based decisions.

Image title

Figure 1: Time-based decisions for data used and for predictions.

After combining the provided datasets, the resulting dataset included 10,599 rows, one per patient, and 199 attributes or features per patient.

Predictive Model

After cleaning the data and applying feature selection, we defined our objective as a binary classification problem: predict death during hospitalization through 90 days after discharge among sepsis patients.

For the algorithm to be used, we selected gradient boosted trees using the XGBoost package, which has been dominating popular machine learning competitions given its execution speed and robust performance. Another motivation for using XGBoost is the ability to fine-tune hyper-parameters in order to improve the performance of the model. Within the training data, we used ten-fold cross-validation and GridSearchCV to select parameter values in an iterative manner to maximize the area under the ROC curve (AUC). A practical example of this process in IBM Data Science Experience is here.

We split the data into training (60%) and testing (40%) sets. Using the tuned hyper-parameters from the training data, we applied the model to the test data, which resulted in the following model performance seen in Figure 2.

Image title

Figure 2: Performance of our XGBoost model.

What do these numbers mean?

For the AUC (area under the curve) score, the closer this number is to 1, the better a model's ability is to correctly classify true positive (TP) predictions minimizing false positives. With an AUC of 0.8561, our model (during tests) was able to identify the vast majority of patients with sepsis who would die, so those patients could be targeted with adequate treatment.

For precision and recall, another way to look at them is with a precision-recall curve (area under the PR curve). The closer to 1.0 this number is, the better a model can achieve a balance between precision (a.k.a. positive predictive value) and recall (AKA sensitivity). In our case, the number was 0.80. We favored high recall — the intent was to minimize the number of patients missed by this model who could eventually die due to sepsis.

Another metric we used was the model's accuracy. We used bootstrapping to generate 1,000 variations of training and testing datasets, running the XGBoost model on them, and obtaining the model's accuracy for each run. The distribution of the bootstrapped accuracy over 1,000 runs gave us a 95% confidence interval on accuracy between 0.77 and 0.79, which means our model was able to identify over three-quarters of the true results (both true positives and true negatives).

In addition to the numbers and their interpretation explained above, the confusion matrix for this model is seen in Figure 3. It shows that for the test data, our model identified 1,190 patients as true positives (prediction of death for patients who actually died) and 2,087 as true negatives (prediction of survival for patients who actually survived).

Image title

Figure 3: Positive and negative predictions.

We also used XGBoost's capability to determine features importance using the "cover" parameter. This capability does not inform whether that feature is a strong predictor of death or strong predictor of survival, but the information generated by XGBoost is still very useful, as we can see the expected percentage of patients for which that feature is used in predicting death.

For example, as seen in Figure 4, the "Age at hospital admission" feature is used by 29.5% of patients to predict death.

Image title

Figure 4: Feature importance of the 20 most important features in the final model.

We conducted further exploratory analysis to examine how features were distributed with respect to the outcome variable (death). While these plots are helpful to visualize a high-level relationship with the outcome, it's important to understand that XGBoost trains multiple decision trees, which are non-linear in nature. So, important features in an XGBoost model may not have an obvious relationship with the outcome variable in these exploratory plots.

For example, as seen in Figure 5, a feature such as "Age at hospital admission" may suggest that that older patients have a higher proportion of deaths compared to younger patients. Another example, the "Hours spent on vasopressors" feature may suggest that patients who took vasopressors longer had higher death rates, but these deaths could as well have been due to the severity of their health condition (e.g. if the sepsis condition evolved to a septic shock), thus requiring them to be on vasopressors for a longer duration.

Image title

Figure 5: Patients deaths related to some of the most important features.

The decision tree rules outputted by XGBoost can be used to help further understand how to target patients for treatment. For example, the medical team may provide special attention to older patients due to their higher mortality risks, may monitor the duration of vasopressors taken, may try to reduce the number of patients transfers between hospital departments in order to minimize the impact on susceptible patients, and so on.


Predicting all-cause death in sepsis patients can guide health providers to actively monitor and take preventive actions to improve patients' survival. Many of the features that were identified as important in our model are known to be associated with sepsis patients' death. This provides reassurance that our machine learning model can help identify well-known associations with sepsis death even among the noise of many unrelated variables. However, in this analysis, we excluded features from key data sources that had a lot of missing data, including lab results and vitals. We expect the model performance to improve by adding those features later. We will continue our collaborative work with Geisinger to analyze an updated and more comprehensive set of clinical variables and continue to further improve our model and its clinical utility. With more interventional features, we hope to produce a more actionable model that can assist Geisinger in their care for sepsis patients.


Many thanks to the IBM Academy of Technology and our IBM executive sponsor Rob Thomas for approving this initiative, where we had participation in our weekly calls from Debdipto Misra, Bipin Karunakaran, Rameswara Sashi Challa, and Satish Kalepalli from Geisinger, and Shantan Kethireddy, Aleksandr Petrov, Wanting Wang, Rajiv Joshi, Cheranellore Vasudevan, Alan Newman, and Vidhya Shankar from IBM.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

machine learning ,ai ,data science ,predictive analytics ,healthcare

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}