Big Data 'Must Bookmark' Links of 2016

DZone 's Guide to

Big Data 'Must Bookmark' Links of 2016

Find some of the key resources and projects on the web to guide your Big Data strategy going into 2016.

· Big Data Zone ·
Free Resource

Maptive has assembled a must follow list of 100 Big Data people on Twitter.   Amazing list!  Can't forget the Big Data Awesome list on Github.

That's a great start, but what else do you have?

How about, KDNuggets which has tons of Data Science and Big Data links.

Next up a list of must read blogs that intersect Big Data, Cloud, Microservices, Containers, and IoT:

  • My old company, Pivotal, they have software and solutions for all the things.

  • IoT Central for all things...

  • DataBricks are heavy, but when they are Apache Spark, they are a must read.

  • Elepants never forget and publish great Big Data content.

  • Intel is inside lots of things, but they have awesome Big Data knowledge.

  • Hortonworks top 10 blog posts from 2015.

  • Spring now is the programming glue for everything without the XML configuration nightmare of yore.

Where's every project you are interested in?  Apache Big Data Stack, from Spark to Hadoop to NiFI, Kafka, ... Everything is under the Apache umbrella.

What's new?

  • Apache NIFI, Hortonworks is backing this awesome GUI driven big data project.

  • Apache Zeppelin, a very cool Big Data notebook.

  • Apache Geode, again from my friends at Pivotal.   This is an awesome in-memory data grid, commercially known as Gemfire.

  • Apache Airavata, multitasking supertool.

  • Apache DataFu, best named Apache Big Data Project in my mind.

  • Apache Crunch, stays crispy in milk, map reduce and Spark.   You had me at crunch.

  • Apache Falcon, an interesting data management project.

  • Apache Flink, the superfast squirrel that came out of nowhere and exploded.

  • Apache Tajo, distributed relational datawarehouse on Hadoop.   Used in CDAP, not sure how this isn't huge yet.

  • Apache Phoenix, fast relational layer over HBase.

  • Apache HAWQ, fast MPP SQL on Hadoop, open sourced from Pivotal.

  • Apache Giraph, high scalability graphing system, adds to the huge list of graph processing solutions out there.

  • Apache Hama, is a BSP framework for Big Data Analytics.  This one is still being baked, but could be insanely useful.  I am waiting and watching this one.

  • Apache Helix, clustering and partioning solution that works with Zookeeper.

  • Apache MetaModel, a common interface to a ton of different data sources including HBase, RDBMS and NOSQL stores.

  • Apache ORC, yet another file format.   Also, Apache Parquet and Apache Avro.

  • Apache MADlib, Machine Learning in SQL on Postgresql, Greenplum and HAWQ.

  • Apache Gora, in data memory model.

  • Apache Twill, layer over Yarn.

  • Apache Accumulo, key-value store on HDFS with cell level security.

  • Apache Drill, SQL ontop of NoSQL, Hadoop and RDBMS.

  • Apache Chuka, analysis and monitoring for Hadoop.

  • Apache Ambari, the slick install, configuration and administration tool for Hadoop.

  • Apache Slider, not a small hamburger but a framework on top of Yarn for better clustering.

  • Apache Storm, distributed real-time computation framework that is widely used with Hadoop.

Some of my favorite new things to experiment with:

Open Data Platform's Sandbox for Hadoop.

Pivotal's Spring Cloud DataFlow

analytics, apache spark, big data, hadoop, iot app development

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}