Extracting Insights From Clinical Notes

DZone 's Guide to

Extracting Insights From Clinical Notes

In this article, we discuss how the healthcare industry can use NLP and the Elastic stack to better find insights from unstructured clinical notes.

· Big Data Zone ·
Free Resource

Healthcare Challenges

Electronic healthcare records (EHRs) have become more sophisticated and feature-rich, but despite these advances, doctors still enjoy the simplicity of summarizing their patient encounters in narratives, free-form clinical notes, and text. As a result, about 80 percent of healthcare data is unstructured. 

In addition, for many healthcare organizations, data is housed across disparate sources, making it difficult to gain holistic patient insights. Plus, the amount of data that has not been processed/used is so vast, it creates a technical challenge to consume and process historical data that goes back 10 to 20+ years.

The important questions healthcare organizations are asking include:

  • How do we get more value out of our clinical notes (where 80% of my data insights reside)?
  • How do we merge clinical data with claims and other data to get a holistic view of patients?   
  • How do we handle huge volumes of data, so I can do all these in a scalable way?
You may also like: 10 Steps for Analyzing Unstructured Data.

How Can Natural Language Processing (NLP) Help?

To really improve operations, patient outcomes, and boost revenue, healthcare organizations need to be able to convert unstructured narratives and text (from patient encounters or imaging as an example) into a structured format that identifies valuable clinical terms and codes that can be easily searched and analyzed.   

This enables analysts and clinical researchers to use data once locked away in the physician's clinical notes to find patient information quickly. Secondly, once notes that have been structured, the data can be utilized for predictive analytics to identify and manage risks for individual patients and high priority cohorts. 

To solve this problem, there are three important puzzle pieces:

  • The ability to perform Natural Language Processing (NLP) and apply sophisticated clinical rules to extract insights and store them as structured, clinical concepts.
  • The ability to perform this NLP operation at scale to handle a multi-year backlog of unstructured data and seamlessly integrate that data to other valuable data types such as claims or genomics.
  • The ability to build elegant and simple interactive interfaces that support the clinical workflow and allow rapid search and real-time analytics

NLP of clinical notes workflow

NLP clinical notes workflow

Opportunities Created by Marrying Clinical Data With Other, High-Value Data Types 

By integrating claims data and the deep insights from clinical notes, healthcare organizations can begin to address some of their analytic and clinical workflow challenges, such as:

  • Undercoding or overcoding.
  • Timely patient follow-ups, particularly for high-risk patients.
  • Rapid identification of patients in need of improved care coordination or intervention.
  • Accurate risk adjustments for complex cases particularly with patients who have multiple condition codes.

Using NLP to Address Miscoding

Undercoding and overcoding are an important challenges healthcare organizations are looking to address using NLP. 

The undercoding of diagnoses, procedures, evaluations, and management services occurs when the codes billed do not represent the full scope of the work performed by the physician or facility. This commonly happens during complex cases, where the patient has pre-existing issues or complex conditions, or multiple treatments performed as part of their encounter.  

Often, manual coding of these types of encounters can result in human error, particularly when nurse coders are reviewing hundreds of patient records a day. As such, undercoding can result in the loss of revenue and reimbursements. This is where extracting insights from clinical notes can help resolve the discrepancies between what was performed by the physician vs. what is documented in the notes and what was actually billed. Addressing undercoding helps healthcare providers find revenue opportunities and ensure their claims match the exact services and treatments that were rendered.

The overcoding of diagnoses, procedures, evaluations, and management codes occurs when codes are reported in a manner that results in a higher payment than the services or treatment that was rendered by a Provider. When done intentionally, this is considered fraud, but often, overcoding happens through manual error, oversight, or missing information in the patient record.  

NLP significantly reduces the likelihood of these situations by automating the review and coding of patient records. When information is missing, NLP can be utilized as a tool within clinical documentation improvement (CDI) programs. This provides an opportunity for the hospital to review the current practices and to make sure every aspect of the patient’s treatment is documented every step of the way. 

How Can AI, Cloud, Open Source, or Progressive Technologies Help?

To get a holistic view of your patient or member population, organizations should integrate clinical notes with other high-value data types such as claims, images, or genomics. Doing so can identify patterns and correlations related to disease state, billing, or care not readily apparent in the tedious manual review of patient charts. Understandably, challenges exist to do this easily including: 

  • The complex, unstructured formats of these data types which can often be incongruent or incorrect.
  • The data quality of these data types which may require cleansing or data fixing. 
  • The large volume of these unstructured data types and the longitudinal nature of patient records.
  • The cost to stand up an environment to support this integration and any resulting analytics.  

Clinical analysis stack

NLP stack

Fortunately, progressive technologies, such as Apache Spark, Google Cloud Platform, Kubernetes, Elastic, and React come to the rescue when tackling challenges like these. The use of Apache Spark can help streamline data ingest and integration activities that often require significant ETL development. Spark coupled with cloud technologies such as GCP and Kubernetes further assist in scaling the processing and compute needed to perform these ETL jobs and do so cost-effectively.  

These technologies also assist in rapid deployment and automation so that developers can iterate and debug quickly. With capabilities under one umbrella, Spark also offers machine learning features to support the required NLP and apply advanced predictive analytics on the data that is transformed. Elastic/Elasticsearch comes in handy to help index the massive amounts of unstructured notes data at scale and then deliver search results quickly. Lastly, Javascript libraries such as React.js are advantageous when building out responsive and mobile-friendly applications that can easily adapt and deliver complex analytics through an interactive front-end.      


Organizations primarily relying on structured data for all their analytics and decision-making processes can get a fuller and more accurate picture of their patient population by incorporating insights from their unstructured data. We discussed several customer examples and use cases taking advantage of NLP on their clinical notes. In subsequent articles, I will dig deeper into other opportunities to leverage clinical notes data.

Further Reading

big data analtics ,healthcare ,cloud ,analytics ,gcp ,artificial inteligence ,big data

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}