Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

This Week in Hadoop and More: Deep Deep Learning and Streaming Ingest

DZone's Guide to

This Week in Hadoop and More: Deep Deep Learning and Streaming Ingest

Take a look at another weekly update on the state of Big Data, Hadoop, Machine Learning, Deep Learning, and the Internet of Things.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

This article has lots of cool Big Data, Deep Learning, Machine Learning, and real-time streaming ingest tools, articles, and examples.

Hive + Druid Tuning shows how to index and rapidly query uber fast data sets with Apache Hive and Druid in HDP 2.6. This is really cool stuff and coming to a cluster near you! Hive Druid Performance is a great article on performance enhancements with Hive and Druid.

Spotify's Scala Feature Transformation for Flink, Scalding, Scio, and Spark is a great new library for working with Machine Learning in Scala in all the major Scala frameworks.

Often, you need to generate a lot of data in a specific format, while you can easily do that in Apache NiFi, sometimes you want to do that in code. Spotify's RataTool has you covered — it creates random data sampling and generation including Avro, Parquet, PROTOBUF.

Image title

Yahoo!'s Bullet

This is a real-time query engine that lets you run queries on very large data streams
and does not use a persistence layer. This makes it light-weight, cheap, and fast
on Storm. This is very interesting and makes for a great way to quickly query huge Hadoop data sets.

Another cool tool from Yahoo!, EGADS (Extensible Generic Anomaly Detection System), is an open-source Java package to automatically detect anomalies in large scale time-series data.

Yahoo! also has a cool microservice framework for easily building RESTful web services for time series reporting with Big Data analytics engines like Druid and Hive.

Apache Big Data 2017 in Miami Wrap-up

All the videos from this event are here.  There are a number of great talks; I have a few here I highly recommend:

Using Flink for IoT

A cool talk on NiFi and MiniFi

Apache Ignite SQL Grid

Apache Spark Best Practices

SparkSQL Plus Pig

GPU on Spark

Apache Beam for Everything

Helium Makes Zeppelin Fly

Petabyte Log Management at Twitter With Tez

Spark and Hive Meter Querying

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,machine learning ,deep learning ,hadoop ,streaming

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}