Hortonworks Genomics and Precision Medicine Solutions

DZone 's Guide to

Hortonworks Genomics and Precision Medicine Solutions

Hadoop and Spark to help Cancer research at Arizona State University.

· Big Data Zone ·
Free Resource

Before we drill down into how Hortonworks partnered with Arizona State University (ASU) to design and develop a platform to discover genomic links to cancer, let’s take a look at a few of cancer’s fundamental attributes. Cancer is both a complicated and complex disease. 

Cancer is complicated because it is not actually a single disease, but rather the result of the dynamic interplay between many moving biological and environmental components.

Cancer is complex because it is a living system that behaves in a systematic way beyond the sum of its component parts. For example, think of the pointillist painters who put individual dots of color on canvas—only when you look at the dots from a distance do you understand what the painting portrays.

Arizona State University (ASU) sought out Hadoop and Big Data approaches to see through cancer’s complexity. Now the ASU team can both store more “dots” and also process the combination of dots in different ways to understand the relationships that form the complete picture. Before HDP the team was simply unable to either store sufficient data or analyze the complex relationships between the parts. Read the full case study here.

Hortonworks Genomics and Precision Medicine Solutions

Data Architects and Researchers at Arizona State University worked with the Apache™ Hadoop® experts at Hortonworks to build a genomics data lake that integrates HDP (with Apache Spark) into a high-performance compute (HPC) environment. ASU researchers now use HDP to store and process genomics data on thousands of individuals in order to better understand how each variant in a genome might influence cancer risks and response to treatments.

Dr. Kenneth Buetow is a leading cancer researcher at ASU who sums up the transformation this way, “What we’ve been able to do already is find relationships that previously had not been described. The reason this was not practical to do in the past is that there’s literally millions of variants and tens of thousands of individuals genes. It was just not computationally and/or storage practical to do all those possible combinations.”

Register for the June 1 Webinar with ASU’s Scientific and IT Experts

On Wednesday, June 1, I will host a webinar detailing ASU’s Big Data journey with genomics, focussing on cancer research and precision medicine. I will be joined by two leaders that drove this exciting initiative at ASU:


Jay Etchings is Director of Operations for Research Computing at ASU. His team built the genomics data lake and integrated it within a high-performance compute environment that uses Apache Spark for genomic queries and data workloads.

Dr Buetow - ASU

Dr. Kenneth Buetow is the Director of the Computational Sciences and Informatics Program at ASU. Before joining ASU in 2012, Dr. Buetow was director of the Center for Biomedical Informatics and Information Technology at the National Cancer Institute, within the US National Institutes of Health. Now his team uses HDP to analyze genomic data to discover insights that may shed light on innovative approaches to understanding and treating cancer.


big data, hadoop, hortonworks, research, spark

Published at DZone with permission of Richard Proctor , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}