Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Alec Noller09/24/14
11903 views
0 replies

Dev of the Week: Sander Mak

This week we're talking to Sander Mak, Senior Software Engineer at Luminis Technologies, JavaOne Rockstar, and featured author in DZone's 2014 Guide to Big Data.

Benjamin Ball09/24/14
8003 views
0 replies

Addressing the Problems of Big Data: Answers from 6 Experts on Twitter

Monday was the official launch of DZone's 2014 Guide to Big Data and to kick off this event, we gathered a panel of six Big Data specialists to participate in DZone's first Twitter Q&A, which appeared under the #DZBigData hashtag.

David Mai09/24/14
1544 views
1 replies

Is ETL on Life Support?

Two years ago, Phil Shelley, the former Chief Technology Officer of Sears Holdings and CEO of Metascale, predicted the death of ETL. Now, two years later. Let’s take a look at what has happened to ETL.

David Mai09/24/14
1212 views
0 replies

2014 Magic Quadrant for Data Integration Tools: Déjà vu All Over Again

A brief review that summarizes the few changes in the Magic Quadrant for Data Integration Tools between 2013 and 2014.

Raymond Camden09/23/14
3021 views
0 replies

Using the New York Times API to Chart Occurrences in Headlines

This weekend I discovered that the New York Times has a pretty deep developer API. I thought I'd try to build a little experiment. What if we could use the API to map the number of times a keyword appeared in headlines over time?

Alec Noller09/22/14
12688 views
1 replies

Introducing DZone's 2014 Guide to Big Data

DZone's 2014 Guide to Big Data was produced to help you discover emerging information about the Big Data landscape and learn about how the shifting needs of data scientists and developers are influencing new tools and technologies.

Trevor Parsons09/19/14
4091 views
0 replies

How to Avoid the Big Data Black Hole

Data collection should be synthesized into meaningful events. Getting users addicted to a platform by the quality and frequency of decisions versus encouraging them to spin the wheel to see what happens and becoming a 5th limb.

Steve Hanov09/19/14
4660 views
1 replies

A Quick Measure of Sortedness

How do you measure the "sortedness" of a list? Here, I propose another measure for sortedness. The procedure is to sum the difference between the position of each element in the sorted list, x, and where it ends up in the unsorted list, f(x). We divide by the square of the length of the list and multiply by two, because this gives us a nice number between 0 and 1. Subtracting from 1 makes it range from 0, for completely unsorted, to 1, for completely sorted.

Mihai Dinca - P...09/19/14
6032 views
2 replies

The Java Versions War

Do not take for granted that if your application works with Java version X it will automatically and flawlessly work with any Java version Y > X.

Mark Needham09/19/14
4717 views
0 replies

R: Calculating rolling or moving averages

I’ve been playing around with some time series data in R and since there’s a bit of variation between consecutive points I wanted to smooth the data out by calculating the moving average.

Mark Needham09/19/14
3276 views
0 replies

R: ggplot – Plotting a single variable line chart

I’ve been learning how to do moving averages in R and having done that calculation I wanted to plot these variables on a line chart using ggplot.

Alec Noller09/17/14
11094 views
0 replies

Dev of the Week: Chanwit Kaewkasi

This week we're talking to Chanwit Kaewkasi, Assistant Professor at the Suranaree University of Technology’s School of Computer Engineering in Thailand, co-developer of a series of low-cost Big Data clusters, and featured author in DZone's upcoming 2014 Guide to Big Data.

Chris Odell09/17/14
2087 views
0 replies

Why I will Always Try And Find A Ready-Built Library

By the time you have developed something and fixed any issues with it, your version is simply not going to be as tested as a ready built component that is used by thousands of people.

Alec Noller09/15/14
2479 views
1 replies

Join Us For Our Big Data Twitter Q&A! #DZBigData

In anticipation of our 2014 Guide to Big Data, we have arranged for a panel of experts - Kirk Borne and Jonathan Ellis, to answer your Big Data questions on Twitter on Monday, September 22, 2014. To participate, simply ask a question using the hashtag #DZBigData.

Drew Harvey09/15/14
3505 views
0 replies

DB2 CONCAT (Concatenate) Function

The DB2 CONCAT function will combine two separate expressions to form a single string expression. It can leverage database fields, or explicitly defined strings as one or both expression when concatenating the values together.

Rob J Hyndman09/12/14
5861 views
0 replies

Generating quantile forecasts in R

A “quan­tile fore­cast” is a quan­tile of the fore­cast dis­tri­b­u­tion. Still assum­ing nor­mal­ity, we could gen­er­ate the fore­cast quan­tiles from 1% to 99% in R using...

Rick Delgado09/12/14
874 views
1 replies

How to Educate Employees on Keeping Data Safe

Computer Security breaches can end up costing even the average small business up to $200,000

Ajitesh Kumar09/11/14
2094 views
0 replies

How to Start a Big Data Practice

This article represents key aspects of starting up a Big Data practice in your organization. Currently, I have started working in the same area and this blog is the result of my research. Hope you find it useful.

Kai Wähner09/11/14
6842 views
0 replies

Comparison of Alternatives for Stream Processing and Streaming Analytics

The article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from. Comparison of open source and proprietary stream processing / streaming analytics alternatives: Apache Storm, Spark, IBM InfoSphere Streams, TIBCO StreamBase, Software AG's Apama, etc.

Alec Noller09/10/14
9821 views
0 replies

Dev of the Week: Adam Diaz

Every week at DZone, we feature a new developer/blogger to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Adam Diaz, Hadoop Architect at the Teradata Big Data Center of Excellence and featured author in DZone's upcoming 2014 Guide to Big Data.

G. Ryan Spain09/05/14
3613 views
0 replies

Stinger.next: The Future of SQL in Hadoop

Hortonworks’ Stinger Initiative, which finished rolling out in April, expanded on the Hive engine to allow for interactive SQL queries at the Hadoop scale. Now Hortonworks has announced their next set of objectives for Hive, which they are calling Stinger.next.

Maarten Ectors09/05/14
4281 views
0 replies

Instant Big Data Stream Processing = Instant Storm

Every 6 months at Canonical, the company behind Ubuntu, I work on something technical to test our tools first hand and to show others new ideas. This time around I created an Instant Big Data solution, more concretely “Instant Storm”.

G. Ryan Spain09/05/14
5329 views
0 replies

Changing Our Views on Using and Analyzing Big Data with Hadoop

In 2006, Hadoop became one predominant solution in the world of Big Data, and it remains a major player for processing Big Data today. But as needs for Big Data analysis expand and evolve, some analysts and developers consider Hadoop unable to perform to their standards.

Trevor Parsons09/04/14
3789 views
0 replies

What is Syslog?

Syslog has been around for a number of decades and provides a protocol used for transporting event messages between computer systems and software applications. The protocol utilizes a layered architecture, which allows the use of any number of transport protocols for transmission of syslog messages.

Mark Needham09/04/14
6293 views
0 replies

R: dplyr - group_by dynamic or programmatic field

In my last blog post I showed how to group timestamp based data by week, month and quarter. I wanted to pull this code out into a function. It turns out if we want to do this then we actually want the regroup function rather than group_by: