This Week in Hadoop and More: Deep Deep Learning and Streaming Ingest
Take a look at another weekly update on the state of Big Data, Hadoop, Machine Learning, Deep Learning, and the Internet of Things.
Join the DZone community and get the full member experience.Join For Free
This article has lots of cool Big Data, Deep Learning, Machine Learning, and real-time streaming ingest tools, articles, and examples.
Hive + Druid Tuning shows how to index and rapidly query uber fast data sets with Apache Hive and Druid in HDP 2.6. This is really cool stuff and coming to a cluster near you! Hive Druid Performance is a great article on performance enhancements with Hive and Druid.
Spotify's Scala Feature Transformation for Flink, Scalding, Scio, and Spark is a great new library for working with Machine Learning in Scala in all the major Scala frameworks.
Often, you need to generate a lot of data in a specific format, while you can easily do that in Apache NiFi, sometimes you want to do that in code. Spotify's RataTool has you covered — it creates random data sampling and generation including Avro, Parquet, PROTOBUF.
This is a real-time query engine that lets you run queries on very large data streams
and does not use a persistence layer. This makes it light-weight, cheap, and fast
on Storm. This is very interesting and makes for a great way to quickly query huge Hadoop data sets.
Another cool tool from Yahoo!, EGADS (Extensible Generic Anomaly Detection System), is an open-source Java package to automatically detect anomalies in large scale time-series data.
Yahoo! also has a cool microservice framework for easily building RESTful web services for time series reporting with Big Data analytics engines like Druid and Hive.
Apache Big Data 2017 in Miami Wrap-up
Using Flink for IoT
A cool talk on NiFi and MiniFi
Opinions expressed by DZone contributors are their own.