Dr. Mahout: Analyzing Clinical Data Using Scalable and Distributed Computing
The Cloud Zone is brought to you in partnership with Iron.io. Discover how Microservices have transformed the way developers are building and deploying applications in the era of modern cloud infrastructure.
If you haven't already, I recommend you take a look at a couple of our other post covering Isabel Drost's presentations on Apache Mahout, a sub-project of Lucene, at Devoxx 2010 and more recently ApacheCon NA 2011. Drost has a unique perspective as co-founder of Mahout and her talks should give you a good feel for what the toolset is and the kinds of problems (machine learning, data mining, etc.) that it's meant to solve.
After you've done that, or if you already know what Mahout is all about, you should definitely check out another presentation from ApacheCon, this time by Shannon Quinn, a doctral candidate in the Joint Carnegie Mellon-University of Pittsburgh PhD program for computational biology. Quinn's presentation provides us with a highly practical example of how it can be used in the medical field. Clinicians deal with a ton patient data, and Quinn thinks Mahout is the right tool to organize and model that data. From the presentation's abstract, which deals with treating patients suffering from the rare genetic disease primary ciliary dyskinesia:
Here we propose using the Mahout framework to efficiently learn models that capture the motion patterns observed in the videos and aiding in objective diagnoses. Through this framework, clinicians will need only take biopsies, gather data as images or videos, upload them to a Mahout/Hadoop cluster, and wait for the results. Patient privacy is maintained by perpetuating only the results of the analysis; computational time is reduced by parallelizing the model learning and comparison process; and models are available to clinicians everywhere through the cloud.
To listen to the entire presentation, grab the audio here.