Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Alec Noller09/10/14
8601 views
0 replies

Dev of the Week: Adam Diaz

Every week at DZone, we feature a new developer/blogger to catch up and find out what he or she is working on now and what's coming next. This week we're talking to Adam Diaz, Hadoop Architect at the Teradata Big Data Center of Excellence and featured author in DZone's upcoming 2014 Guide to Big Data.

G. Ryan Spain09/05/14
3436 views
0 replies

Stinger.next: The Future of SQL in Hadoop

Hortonworks’ Stinger Initiative, which finished rolling out in April, expanded on the Hive engine to allow for interactive SQL queries at the Hadoop scale. Now Hortonworks has announced their next set of objectives for Hive, which they are calling Stinger.next.

Maarten Ectors09/05/14
4010 views
0 replies

Instant Big Data Stream Processing = Instant Storm

Every 6 months at Canonical, the company behind Ubuntu, I work on something technical to test our tools first hand and to show others new ideas. This time around I created an Instant Big Data solution, more concretely “Instant Storm”.

G. Ryan Spain09/05/14
5174 views
0 replies

Changing Our Views on Using and Analyzing Big Data with Hadoop

In 2006, Hadoop became one predominant solution in the world of Big Data, and it remains a major player for processing Big Data today. But as needs for Big Data analysis expand and evolve, some analysts and developers consider Hadoop unable to perform to their standards.

Mark Needham09/04/14
6030 views
0 replies

R: dplyr - group_by dynamic or programmatic field

In my last blog post I showed how to group timestamp based data by week, month and quarter. I wanted to pull this code out into a function. It turns out if we want to do this then we actually want the regroup function rather than group_by:

Trevor Parsons09/04/14
3553 views
0 replies

What is Syslog?

Syslog has been around for a number of decades and provides a protocol used for transporting event messages between computer systems and software applications. The protocol utilizes a layered architecture, which allows the use of any number of transport protocols for transmission of syslog messages.

G. Ryan Spain09/04/14
1425 views
0 replies

Big Data - Link Roundup - September 4, 2014

Links to Big Data Articles and Information, with recent articles on real-world applications of Big Data analysis, thoughts on new and different ways to look at Big Data, and tools for starting Big Data analysis.

Mark Needham09/04/14
2803 views
0 replies

R: ggplot - Cumulative frequency graphs

The first step was to transform the data so that I had a data frame where a row represented a day where a member joined the group. To turn that into a chart we can plug it into ggplot and use the cumsum function to generate a line showing the cumulative total:

Alec Noller09/03/14
9208 views
0 replies

The Best of DZone: August 27 - September 3

If you missed anything on DZone this week, now's your chance to catch up! This week's best include the anatomy of Hibernate dirty checking, the similarities of Swift and Scala, the Agile version of Superman vs. Batman, and more.

Mark Needham09/03/14
4708 views
0 replies

R: Grouping by week, month, quarter

In my continued playing around with R and meetup data I wanted to have a look at when people joined the London Neo4j group based on week, month or quarter of the year to see when they were most likely to do so.

Kin Lane09/03/14
3024 views
0 replies

6,482 Datasets Available Across 22 Federal Agencies In Data.json Files

A list of 22 federal agencies who have published data.json files.

Jennifer Wright09/03/14
1094 views
0 replies

Big Data, Big Value

How valuable is big data? It’s an important question for developers, who need to be able to respond to ever-shifting markets quickly so they are not left behind.

Anders Abel09/02/14
3632 views
0 replies

A Geek's Nightmare

Last night I woke up after a night mare. A nightmare containing a future, “improved” version of powershell a competing blogger and Entity Framework Migrations. Slightly off topic, but I’ll share it anyway.

Rob J Hyndman08/29/14
1078 views
0 replies

Forecasting with R in WA

On 23–25 Sep­tem­ber, I will be run­ning a 3-​​day work­shop in Perth on “Fore­cast­ing: prin­ci­ples and prac­tice” mostly based on my book of the same name.

Kai Wähner08/29/14
812 views
0 replies

Intelligent Business Process Management Suites (iBPMS) - The Next-Generation BPM for a Big Data World

I had a talk at ECSA 2014 in Vienna: The Next-Generation BPM for a Big Data World: Intelligent Business Process Management Suites (iBPMS), sometimes also abbreviated iBPM. I want to share the slides with you.

Mark Needham08/27/14
4273 views
0 replies

R: Rook - Hello world example - 'Cannot find a suitable app in file'

I’ve been playing around with the Rook library and struggled a bit getting a basic Hello World application up and running so I thought I should document it.

Mikio Braun08/27/14
4366 views
0 replies

Big Data & Machine Learning Convergence

As these two fields converge, work has to be done to provide the right set of mechanisms and abstractions. Right now I still think there is a considerable gap which we need to close over the next few years.

Arthur Charpentier08/27/14
3669 views
0 replies

Computational Actuarial Science, with R

A collection of datasets, originally for the book ‘Computational Actuarial Science with R’ edited by Arthur Charpentier (CAS with R). Now, the package contains a large variety of actuarial datasets.

Alec Noller08/26/14
3783 views
0 replies

Refcard Expansion Pack: Getting Started with Apache Hadoop

This week, DZone released its latest Refcard: Getting Started with Apache Hadoop. If you're interested in learning more about Hadoop or sharpening your skills, we decided to dig into the DZone archives and find some of the most popular posts we've had on the topic.

Mahboob Hussain08/26/14
9659 views
11 replies

Thoughts on Hibernate

The way data are laid out in the columns of tables and the way they are used in the application as the class / instance variables there is. However, this mismatch or "impedance" does not come in the way of software development that it requires a framework that abstracts away all the database-related code.

Saurabh Chhajed08/26/14
4125 views
3 replies

How to Setup Realtime Analytics over Logs with ELK Stack

The ELK stack is ElasticSearch, Logstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.

Giuseppe Vettigli08/22/14
4948 views
0 replies

Quick HDF5 with Pandas

HDF5 is a format designed to store large numerical arrays of homogenous type. It cames particularly handy when you need to organize your data models in a hierarchical fashion and you also need a fast way to retrieve the data. Pandas implements a quick and intuitive interface for this format and in this post will shortly introduce how it works.

Robert Diana08/21/14
4686 views
0 replies

Geek Reading August 20, 2014

These items are a combination of tech business news, development news and programming tools and techniques.

Gil Allouche08/20/14
18371 views
0 replies

Hadoop 101: An Explanation of the Hadoop Ecosystem

Hadoop is not a single piece of technology. It's composed of an entire ecosystem of tools companies can choose from to create their big data solution

Doug Turnbull08/20/14
2561 views
0 replies

Introducing Splainer: The Open Source Search Sandbox That Tells You Why

This is the entire art and science of search relevancy. It's not magic gnomes inside a box that understand all about baby bottles. No, it's heavily tuned heuristics that Solr and Elasticsearch use out of the box.