Over a million developers have joined DZone.

Big Data Zone Link Roundup (Apr. 12)

DZone 's Guide to

Big Data Zone Link Roundup (Apr. 12)

· IoT Zone ·
Free Resource

For a look at what's been happening outside of the Big Data Zone, we've assembled a collection of links from around the web covering all the tutorials, tools, new releases, rants, and raves you might have missed over the past couple of weeks:

Tutorials & Tools

30 Best Tools for Data Visualization

The range of technologies available by which to collect and examine data is constantly on the rise- both in web and desktop applications, which provide several great interfaces.

Sanitize your data from your Rails + Postgres application

This time I would like to show you how to sanitize your database using a very simple Ruby script. This example has a very specific goal. You will find this useful if you are using Postgres as your database engine and Rails as your backend platform. Rails is vastly used with Postgres as a database engine. So I think this example could come in handy for a great number of developers.


OlegDB is a database that meets the bottom line head-on. It operates under a startling new enterprise-ready paradigm which we call MAYO: Marginally available Yoke and Oil database.

Splunk's Big Data Promise: Google for Your Visual Analytics

Splunk, nominally used for system logs, shows signs of evolving into a data processing platform via Tableau Software partnership.


Piglet is a simple data processing environment for processing and analyzing small data sets (inspired by Apache Pig). Piglet supports a small number of capabilities that were inspired by Apache Pig. This includes a simple type system (BYTEARRAY, CHARARRAY, LONG, DOUBLE, BOOLEAN).

Apache Storm and Hadoop

In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on the processing of data.

Experiment with Embedded Solr in java and C

Solr is poster boy in search market . Generally it is used by running as HTTP server and making queries . I was more interested in running it in embedded manner and getting results via JNI .

Pythons, Elephants, and Whales

When we set out to look at natural language processing solutions, the first thing we realized was that NLTK has found a sweet spot. NLTK (or Natural Language ToolKit) is a Python solution for natural language processing. It is able to solve many (if not all) of the problems that heavier, more established solutions like the Stanford NLP tools can. However, unlike those tools, it is accessible to those of us who are not experts in NLP.

Google Compute Engine VMs provide a fast and reliable way to run Apache Hadoop. Today, we’re making it easier to run Hadoop on Google Cloud Platform with the Preview release of the Google Cloud Storage connector for Hadoop that lets you focus on your data processing logic instead of on managing a cluster and file system.

The Truth About MapReduce Performance on SSDs

In the Big Data ecosystem, solid-state drives (SSDs) are increasingly considered a viable, higher-performance alternative to rotational hard-disk drives (HDDs). However, few results from actual testing are available to the public.

News & Opinion

Europe Deadlocked Over Data Protection Reform 

Foremost in recent discussions has been the need to consolidate definitions of differing levels of privacy risk; from personally identifiable records through to truly anonymous information. One sticking point has been where information falls somewhere between these two extremes. The latest proposal includes an attempt to establish a third, intermediate classification, but this step is easier said than done.

LucidWorks, Hortonworks Team Up to be Hadoop's Search Engine

With Hadoop turning into a one-size-fits-all repository for data, an array of search solutions specifically for Hadoop have come to the fore over the past year. One of those contenders, LucidWorks, has joined with Hortonworks, one of the major distributors of Hadoop, to offer the LucidWorks edition of Hadoop search engine Solr as a reference architecture for searches on the Hortonworks Data Platform, or HDP.

The Internet of Things is Embedding Itself Into Everyday Items

Part of the reason why internet-connected devices are increasingly common is due to a decrease in costs – not only are sensors becoming cheaper to make but it is also less expensive to store the data.

New Relic Debuts Splunk-style Analytics for Software

New Relic Insights allows developers to harvest real-time statistics about running apps and crunch the results in its cloud.

What Is Hadoop Exactly? A Cynic's Theory

Anything that looks too good to be true usually is. Such might be the case with Apache Hadoop, the much-ballyhooed open-source project that everyone keeps talking about.

Paul Maritz at Structure: Hadoop is Just One Ingredient of a ‘Profound Shift’ in Software

The possibilities are far reaching, but it requires a mindset shift. For old industries, it requires leadership with clear enough vision to respond to the opportunities of data centric applications and services.

IBM Rolls Big Data Software to Combat Big Business Fraud

The Counter Fraud Management Software offering uses analytics to root out fraudelent claims.

Why You Should Never Trust Data Visualization

Pete Warden is spot on about being skeptical of data, but it is data visualization, not data science, where caution is most crucial.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}