Over a million developers have joined DZone.

Together We Can Solve the Challenges Facing Genomics Research

This article takes a look at using Big Data to solve the problems in Genomics research with an open foundation.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

You may have seen our invite to join the genomics consortium.

Let me recap a little about what this is about and catch you up to speed on our progress and next steps. Hortonworks is quarterbacking a consortium of leading healthcare organizations and subject matter experts to help develop the platform requirements for next generation research. We all know that genomics is revolutionizing the field of medicine, but a number of technical and data-related challenges remain before its full potential can be realized.

Our objective is to tackle these challenges in order to provide the world’s gold standard open-source genomics platform that provides the data warehouse and analytical foundation to accelerate precision medicine in research and clinical care.

We’ve made some great progress since the first announcement at Hadoop Summit in late June and have seen first hand how an ecosystem of like-minded individuals from multiple organizations can identify the key problems and move to resolution faster than any one company. Here is what Jason Ross of OneOme said: “The real problem is how do we transform all this available genomic information into knowledge.  The real challenge is how to store and analyze massive amounts of data in a platform AND keep everything connected and in the context of biology.  Now do this at scale… Our initiative is focused on the providing a platform that enables massive analysis and drives greater utility in real-world patient care needs instead of academic genomics.” 

 Here are 4 insights into what we learned to date…

  1. Participants personally tell me that the potential is there to address the core challenges via this consortium. Our objective is to tackle these challenges by creating a scalable, open source platform for genomic storage and analytics. They say we’re on the right track.
  2. The response has been great, from companies and individuals joining and participating. We’ve had in person group meetings and consortium members have conducted dozens of persona-based interviews, as we move through defining the key issues with current technologies. There has been vigorous debate.
  3. Design Thinking methodology has been invaluable in keeping the consortium focused on defining the issues completely before we move to the development phase.
  4. We have made progress but there is much to do. So far members have been able to define and validate key challenges such as:
    1. Searching through multiple databases is inefficient
    2. Lack of meta-data management and data curation hinders collaboration
    3. Issues around data quality reduce data confidence
    4. Lack of standardization (across organizations) in data storage and management
    5. Inability to store phenotype information along with Genomics data
    6. Unable to create graphs involving multiple types of information (Genomics, Phenotypes, Pedigree etc.) and find relationships among them
    7. Need for usage of Genomics data to identify population at risk

It’s not too late to participate and be part of the movement to define and build the definitive platform for genomics/cancer research on the planet. It’s your consortium and our role is to…

  1. Lead the effort in design, development, and release of the platform based on input from the participants.
  2. Lead the effort to collect Big Ideas and business requirements on the Genomics platform for data warehouse and analytics
  3. Conduct Design Thinking workshops, 1:1 interviews, focus groups and surveys
  4. Facilitate Value Discovery workshops to identify value drivers for each of the prioritized Big Ideas / initiatives
  5. Define the product delivery roadmap based on input from participants and other advisors
  6. Develop, test and release the product as part of open source community

We look forward to the next in-person meeting where we will: finish the persona interviews and secondary research, synthesize the data from the research to identify key gaps and requirements, determine the platform capabilities based on the needs, and create the product roadmap.

Just as the open source methodology has provided greater innovation through collaboration…the open source consortium is moving faster together to deliver a better solution for us all.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.

hortonworks,big data,spark,hadoop

Published at DZone with permission of Richard Proctor, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}