Every week here and in our newsletter, we feature a new developer/blogger from the DZone community to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Adam Diaz, Hadoop Architect at the Teradata Big Data Center of Excellence and featured author in DZone's upcoming 2014 Guide to Big Data.
1. What have you been working on lately?
I have been working on a number of things including docker as well as scalding. I really have been meaning to circle back around to Yarn to start writing my own application master but haven't gotten around to it. I think the use of Apache Slider is interesting in lieu of a full blown application master. The super geeked out "I'm home coding on Saturday night" side of me wants to write my own native yarn application master. I have been playing with Spark quite a bit now that it is Yarn enabled.
2. You wrote about the evolution of Hadoop and MapReduce for the Guide to Big Data. What do you think is the future of Hadoop and MapReduce? Will other tools and techniques become more central?
I think any time someone like Google comes out and announces a technology as no longer the shiny new object that it sends waves through the tech world. While I'm sure they have their reasons, I think Hadoop is still evolving as it should. Apache Tez is really taking batch and interactive workloads to the next level with DAGs not possible in MapReduce. This is really going to give the flexibility and speed lots of people are seeking for new applications. Of course "real time" technologies have everyone excited due to their ability to reduce time to insight from data. This has real value and is important in that we aren't just playing computers. All this tech has a point and the point is value from data. My contention is that there is room for different tools for different purposes. MapReduce is still a great paradigm for manipulating large data sets in many cases. Tez will allow that to happen to a much faster and complex degree. Alternate DAG engines like Spark in my mind are a healthy output of a robust competitive community of developers working to find the best answer. The great news is this we all now benefit from expanded choice. I think people will use the tool that fits them best for their use case. Long term many new technologies will come along but many folks have a current investment in MapReduce. Users will migrate to new tech as it proves worthy but the general trend now is the use of tech that can decrease time to insight by analyzing data as it is received as well as storing data for analysis against much larger historical data sets.
3. Are there any particular developer tools or resources you couldn't live without?
It might sound funny but I really simply prefer vi. I like IDEs but simultaneously hate them. They all have their nuances. I really like gihub for helping not lose code and obviously being able to share it freely.
4. Do you have a favorite open source project (or projects) that you've contributed to recently?
Not that I have contributed to, but I do like Apache Hadoop the best. I think I say that because I come from a background in HPC so Yarn specifically just makes sense to me. It looks and feels like a maturing workload management system many of us have used for years in clusters. I also like Apache Ambari quite a bit. I like the REST API for controlling your clusters and Im excited to see new developments like Ambari Views come out.
5. Do you follow any blogs or Twitter feeds that you would recommend to developers?
I think people would find value in http://lambda-architecture.net/ in thinking about how all these Hadoop components might best fit together. If you really want to dig into Hadoop tech the Hortonworks developler blog will show you things hot off the press from the guys writing the code. Another really good one is a newsletter http://www.hadoopweekly.com/ and the Cloudera blogs as well. I also enjoy http://bradhedlund.com/.
6. Did you have a coding first love -- a particular program, gadget, game, or language that set you on the path to life as a developer?
I started out on the Vic-20 and Commodore 64 writing little programs but not until I bumbled into Linux did I do any real coding. Outside of just shell scripting, I really started to program in S-PLUS which is a commercial version of R. I did work with the BioCoductor libraries as well as custom coding for analytics in Bioinformatics to process Microarray and DNA/RNA sequence data. As I outgrew S-PLUS that I moved to Python and Java.
7. Is there anything else you'd like to mention?
You can reach me at http://www.techtonka.com.
Be sure to follow Adam on Twitter, and read his work in DZone's 2014 Guide to Big Data: