This Week in Hadoop and More: Spark, NiFi, and Events
Weekly wrap up on Hadoop, Big Data, Spark, NiFi, and more.
Join the DZone community and get the full member experience.
Join For FreeWelcome back to "This Week in Hadoop and More". There always seems to be a lot more. There are a lot of interesting things coming up: I'll have an article on Concord.io's interesting new distributed streaming framework and an interesting article on using SnappyData with Zeppelin, Spark, and Hortonworks HDP. Those articles came out of talks I had last week with principals from those two interesting new Big Data startups. These are definitely companies and products to watch. Based on last weeks snafu with Calcite, I will sit down with some contributors to this awesome Apache project to find out more and present it to you. Safe to say, Calcite is being used by a number of other Big Data Projects you know and love (Apache Drill, Apache Flink, Apache Hive, Apache Kylin, Apache Phoenix, Apache Samza and Apache Storm).
Also coming soon is my new meetup in Princeton. Hopefully I can meet some DZone readers in person and we can continue sharing information! Join us at the amazing TigerLabs.
There are a number of great new presentations on Apache Spark and specifically streaming with Spark Streaming. These are coming from Strata Hadoop London, which gives us a ton of great content this week.
Another Great Advanced Spark Deck by Chris Fregly
Strata Hadoop in London has released a ton of the videos from their very recent event. These are all worth watching to learn what's going on and what's coming in the world of Big Data.
Here are some great keynotes from Strata/Hadoop 2016 London, along with a number of slide decks available for viewing or download
This talk using the excellent machine learning library from H2O is a must watch: Jo-fai Chow (H2O.ai)
Why is my Hadoop job slow?
Bikas Saha (Hortonworks Inc)
This is a cool talk about finding out why your job is not performing well.
Watermarks: Time and progress in streaming dataflow and beyond
Slava Chernyak (Google Inc.)
This is a really interesting streaming talk by Google!!
Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines in Apache Beam (Google Data Flow).
Kenneth Knowles (Google)
For Hadoop Programmers
Apache Storm Debugging
Storm 1.0 Enhanced Debugging. For the die-hard Storm developers, this will be a godsend in being able to find problems easier.
http://hortonworks.com/blog/whats-new-apache-storm-1-0-part-1-enhanced-debugging/
Open BDRE
Bigdata Ready Enterprise Open Source Software is an interesting new open source Workload management that works with Spark and Hadoop. Seems like a great tool to try in most enterprises.
http://wiproopensourcepractice.github.io/openbdre/
Hadoop Development Testing
Bite Sized HDP Clusters in your IDE for development
https://github.com/sakserv/hadoop-mini-clusters
Genius! This is a great way to do integration testing for Hadoop projects.
Apache NiFi for Rapid DataFlow
Excellent Tutorial on using Apache NiFi
http://hortonworks.com/blog/apache-nifi-not-scratch/
More Great Recent Presentations
Related Refcard:
Opinions expressed by DZone contributors are their own.
Comments