Over a million developers have joined DZone.

This Week in Hadoop and More: Spark, NiFi, and Events

Weekly wrap up on Hadoop, Big Data, Spark, NiFi, and more.

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Image title


Welcome back to "This Week in Hadoop and More".  There always seems to be a lot more.  There are a lot of interesting things coming up: I'll have an article on Concord.io's interesting new distributed streaming framework and an interesting article on using SnappyData with Zeppelin, Spark, and Hortonworks HDP.  Those articles came out of talks I had last week with principals from those two interesting new Big Data startups.  These are definitely companies and products to watch.  Based on last weeks snafu with Calcite, I will sit down with some contributors to this awesome Apache project to find out more and present it to you.  Safe to say, Calcite is being used by a number of other Big Data Projects you know and love (Apache Drill, Apache Flink, Apache Hive, Apache Kylin, Apache Phoenix, Apache Samza and Apache Storm).

Image title

Image title

Also coming soon is my new meetup in Princeton.  Hopefully I can meet some DZone readers in person and we can continue sharing information!  Join us at the amazing TigerLabs.

Image title

Image title

There are a number of great new presentations on Apache Spark and specifically streaming with Spark Streaming.  These are coming from Strata Hadoop London, which gives us a ton of great content this week.

Why is my Hadoop job slow?
Bikas Saha (Hortonworks Inc)
This is a cool talk about finding out why your job is not performing well.  

Watermarks: Time and progress in streaming dataflow and beyond

Slava Chernyak (Google Inc.)
This is a really interesting streaming talk by Google!!

Triggers in Apache Beam (incubating): User-controlled balance of completeness, latency, and cost in streaming big data pipelines in Apache Beam (Google Data Flow).
Kenneth Knowles (Google)

For Hadoop Programmers

Apache Storm Debugging
Storm 1.0 Enhanced Debugging.  For the die-hard Storm developers, this will be a godsend in being able to find problems easier.
http://hortonworks.com/blog/whats-new-apache-storm-1-0-part-1-enhanced-debugging/

Open BDRE
Bigdata Ready Enterprise Open Source Software is an interesting new open source Workload management that works with Spark and Hadoop.   Seems like a great tool to try in most enterprises.
http://wiproopensourcepractice.github.io/openbdre/

Hadoop Development Testing 
Bite Sized HDP Clusters in your IDE for development
https://github.com/sakserv/hadoop-mini-clusters
Genius!    This is a great way to do integration testing for Hadoop projects.

Apache NiFi for Rapid DataFlow
Excellent Tutorial on using Apache NiFi
http://hortonworks.com/blog/apache-nifi-not-scratch/

More Great Recent Presentations

Related Refcard:

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
hadoop ,big data ,hortonworks ,spark ,kafka

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}