Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Big Data Zone: Best of the Week (Apr. 12-19)

DZone's Guide to

The Big Data Zone: Best of the Week (Apr. 12-19)

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

In case you missed them, here are the best posts from this week's edition of The Big Data Zone (April 12th-17th). Hand-picked by the curator of The Big Data Zone. This week: A subtle way to over-fit a new set of data, "on the other side of Big Data," streaming data with Apache Ignite, visualizing matrix multiplication as a linear combination, and growing a spam tree.


1. A Subtle Way to Over-Fit

If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So here’s what you could do.


2. On the Other Side of Big Data

We often discuss big data in the context of helping businesses improve their marketing efforts or cut back on expenses. In fact, we almost always discuss big data from a business point-of-view, rarely mentioning what it’s like on the other side, how it feels to be the audience in the era of big data analytics.


3. Streaming and Transforming Data with Apache Ignite

In its 1.0 release Apache Ignite added much better streaming support with ability to perform various data transformations, as well as query the streamed data using standard SQL queries.


4. Visualizing Matrix Multiplication As a Linear Combination

When multiplying two matrices, there's a manual procedure we all know how to go through. While it's the easiest way to compute the result manually, it may obscure a very interesting property of the operation. In this quick post I want to show a colorful visualization that will make this easier to grasp.


5. Growing a Spam Tree

Consider the following toy dataset, with some spam/ham information, and two words, “viagra” and “lottery”. For the first node, compute Gini index for the two variables. The Gini index is maximal for “viagra”, so that will be the first node.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
bigdata ,big data ,best of the week

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}