Over a million developers have joined DZone.

5 Examples of Big Data in Healthcare

DZone 's Guide to

5 Examples of Big Data in Healthcare

The healthcare industry can benefit immensely from the use of advanced analytics and big data technologies. Check out five big data production examples in healthcare.

· Big Data Zone ·
Free Resource

Healthcare costs are driving the demand for big data-driven healthcare applications. Technology decision-makers in healthcare systems can't ignore the increased efficiencies, the attractive economics, and the rapid pace of innovation that can now be applied to delivering and paying for healthcare. Many are finding that new standards and incentives for the digitizing and sharing of healthcare data — along with improvements and decreasing costs in storage and parallel processing on commodity hardware — are causing a big data revolution in healthcare with the goal of better care at lower cost.

The healthcare industry can benefit immensely from the use of advanced analytics and big data technologies. In this post, we will look at five big data production examples in healthcare.

1. Valence Health: Improving Outcomes and Reimbursements

Valence Health is using the MapR Converged Data Platform to build a data lake that is the company’s main data repository. Valence consumes 3,000 inbound data feeds with 45 different types of data daily. This critical data includes lab test results, patient health records, prescriptions, immunizations, pharmacy benefits, claims and payments, and claims from doctors and hospitals, which are used to inform decisions about improving both healthcare outcomes and reimbursement. The company’s rapid client growth and the associated increasing volumes of data were straining its existing technology infrastructure.

Prior to their MapR solution, if they received a feed with 20 million lab records, it would take 22 hours to process that data. MapR cut that cycle time down from 22 hours to 20 minutes, running on much less hardware. Valence Health is also now able to accommodate customer requests that were very difficult to address in the past. For example, a customer might call and say, “I sent you an incorrect file three months ago, and I need you to take that file out.” Their traditional database solution might take three to four weeks to get that data deleted. MapR snapshots provide point-in-time recovery that enables Valence to roll back and remove that file in minutes.

2. UnitedHealthcare: Fraud, Waste, and Abuse

UnitedHealthcare provides health benefits and services to nearly 51 million people. The company contracts with more than 850,000 physicians and care professionals and approximately 6,100 hospitals nationwide. Their Payment Integrity group has the tough job of ensuring that claims are paid correctly and on time. Their previous approach to managing more than one million claims every day (10 TB of data daily) was ad hoc, heavily rule-based, and limited by data silos and a fragmented data environment. UnitedHealthcare came up with a unique dual model strategy, which meant focusing on operationalizing savings, while at the same time pursuing innovation to constantly leverage the latest technologies.

Here’s how they are doing it: In terms of operationalizing savings, the group is building a predictive analytics “factory” where they can identify inaccurate claims in a systematic, repeatable way. Hadoop is now the data framework for a single platform that’s equipped with tools to analyze a slew of information from claims, prescriptions, plan participants, contracted care providers, and associated claim review outcomes.

They integrated all this data from multiple data silos across the business, including over 36 data assets. And they now have multiple predictive models (PCR, True Fraud, Ayasdi, etc.) at their fingertips that provide a rank-ordered list of potentially fraudulent providers they can pursue in a targeted, systematic way.

3. Liason Technologies: Streaming System of Record for Healthcare

Liaison Technologies provides cloud-based solutions to help organizations integrate, manage, and secure data across the enterprise. One vertical solution they provide is for the healthcare and life sciences industry, which comes with two challenges — meeting HIPAA compliance requirements and the proliferation of data formats and representations. With MapR Streams, the data lineage portion of the compliance challenge is solved because the stream becomes a system of record by being an infinite, immutable log of each data change. To illustrate the latter challenge, a patient record may be consumed in different ways — a document representation, a graph representation, or search — by different users such as pharmaceutical companies, hospitals, clinics, and physicians. By streaming data changes in real time to the MapR-DB, HBase, MapR-DB JSON document, graph, and search databases, users always have the most up-to-date view of data in the most appropriate format. Further, by implementing this service on the MapR Converged Data Platform, Liaison is able to secure all of the data components together, avoiding data and security silos that alternate solutions require.

4. Novartis Genomics

Next-Generation Sequencing (NGS) is a classic big data application that deals with the dual challenge of vast amounts of raw heterogeneous data and the fact that best practices in NGS research are an actively moving target. Additionally, much of the cutting-edge research requires heavy interaction with diverse data from external organizations. It requires workflow tools that are robust enough to process vast amounts of raw NGS data yet flexible enough to keep up with quickly changing research techniques. It also requires a way to meaningfully integrate data from Novartis with data from these large external organizations — such as 1000 Genomes, NIH’s GTEx (Genotype-Tissue Expression), and TCGA (The Cancer Genome Atlas) — paying particular attention to clinical, phenotypical, experimental, and other associated data.

The Novartis team chose Hadoop and Apache Spark to build a workflow system that allows them to integrate, process, and analyze diverse data for Next Generation Sequencing (NGS) research while being responsive to advances in the scientific literature.

5. Healthcare IoT Startup: Working to Classify Heart Conditions Faster

The current heart rhythm analysis process is slow and classification is done manually. They do batch uploads from the devices into the analysis software machines to have medical analysts look at the classification data and then submit a report to the doctors and hospital who then make medical decisions about the patients. The process takes over 24 hours — a long lag before doctors can access the patient data, increasing the risk of medical emergencies.

With MapR-FS, Telemed will now be able to ingest data from various medical devices directly via NFS into their cluster for real-time patient insight. This solution needed to be High Availability and also provide multi-tenancy (due to HIPAA) as they start hosting various hospital patient data and medical device company data. Being able to segment that data by their customers was really important.

With the help of MapR Professional Services, they have been able to build out a solution to hit their July 18th HIPAA review deadline and provide an architecture that fits all the requirements in terms of HA, multi-tenancy, and real-time insights. The CEO has fulfilled his requirement and deadline to his investors and the company will be on track to start selling their SaaS solutions in Q3/Q4.


Improving patient outcomes at the same or even less cost is an extraordinarily tall order for any healthcare provider, given overall costs of healthcare are rising in the US at a lofty 15% clip. Full-scale digital transformation is the key to making this goal a reality, with digitization, enhanced communications, and big data analytics being the legs to support the transformation effort. The many emerging use cases for big data analytics are intimately tied to the ability of Hadoop-based solutions to acquire and store massive quantities of disparate data — structured and unstructured — from just about any source and present it for in-depth analysis.

In selecting a big data platform and in particular a Hadoop distribution, be sure the platform is highly adept at handling the mix of data types in healthcare typically housed in silos, with clinical data in one silo, pharmaceutical data in another, and logistics information on hospital supplies in yet another. This platform should be flexible enough so that caregivers can use complex data like doctors’ notes and imaging files for real patient analysis, not just for archiving.

big data ,healthcare ,data analytics ,hadoop

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}