Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Mark Needham11/23/14
5878 views
0 replies

R: Refactoring to dplyr

I’ve been looking back over some of the early code I wrote using R before I knew about the dplyr library and thought it’d be an interesting exercise to refactor some of the snippets.

Eric D. Schabell11/23/14
1157 views
0 replies

How To Setup Big Data Tooling For JBoss Developer Studio 8

The release of the latest JBoss Developer Studio (JBDS) brings with it the questions around how to get started with the various JBoss Integration and BPM product tool sets that are not installed out of the box.

Mark Needham11/22/14
6134 views
0 replies

R: dplyr - Group by field dynamically

A few months ago I wrote a blog explaining how to dynamically/programatically group a data frame by a field using dplyr but that approach has been deprecated in the latest version. It turns out the ‘group_by_’ function doesn’t want to receive a list of fields so let’s remove the call to list:

Mark Needham11/21/14
5701 views
0 replies

R: Joining multiple data frames

I’ve been looking through the code from Martin Eastwood’s excellent talk ‘Predicting Football Using R‘ and was intrigued by the code which reshaped the data into that expected by glm. I really like dplyr’s pipelining function so I thought I’d try and translate Martin’s code to use that and other dplyr functions.

Rob J Hyndman11/20/14
1462 views
0 replies

Seasonal Periods

The “fre­quency” is the num­ber of obser­va­tions per sea­son. This is the oppo­site of the def­i­n­i­tion of fre­quency in physics, or in Fourier analy­sis, where “period” is the length of the cycle, and “fre­quency” is the inverse of period. When using the ts() func­tion in R, the fol­low­ing choices should be used.

Ricky Ho11/19/14
2991 views
0 replies

The Common Data Science Project Flow

While working across multiple data science projects, I observed a similar pattern across a group of strategic data science projects where a common methodology can be used. In this post, I want to sketch this methodology at a high level.

Hüseyin Akdoğan11/16/14
1411 views
0 replies

Introduction to Elasticsearch Snapshot and Restore Module

When working with large amounts of data, backup and--if necessary--restoring is an important requirement. Elasticsearch has a snapshot and restore module that addresses this need.

Benjamin Ball11/15/14
3654 views
0 replies

The Best of the Week (Nov 7): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (November 07 - November 14). This week's topics include learning R, sample range estimates, extracting datasets in Excel, R class for wildfire scientists, and converting a named vector to a data frame in R.

Ravi Namboori11/15/14
1947 views
0 replies

Big Data For Dummies

This article endeavors to explain how Big Data will bring about changes in information processing in the IT world. Its aim is to reach out to people seeking clarity on this concept, which has been surrounded by so much hype.

Mark Needham11/14/14
7152 views
1 replies

R: Converting a named vector to a data frame

I’ve been playing around with igraph’spage rank function to see who the most central nodes in the London NoSQL scene are and I wanted to put the result in a data frame to make the data easier to work with.

Ravi Namboori11/14/14
1390 views
0 replies

Tapping Big Data To Your Own Advantage

This is SAS's view on big data. The article discusses how big data can be used to take better decisions, cut costs, and gain advantages.

Arthur Charpentier11/13/14
2800 views
0 replies

Extracting Datasets from Excel Files in a Zipped Folder

The title of the post is a bit long, but that’s the problem I was facing this morning: importing datasets from files, online. I mean, it was not a “problem”, more a challenge (I should be able to do it in R, directly)

Jonathan Callahan11/12/14
2007 views
0 replies

Using R: R Class for Wildfire Scientists

Mazama Science has just finished creating class materials on using R for the AirFire team at the USFS Pacific Wildland Fire Sciences Lab in Seattle, Washington. Autodidacts new to R should take about 20-30 hrs to complete the course.

Benjamin Ball11/10/14
1038 views
0 replies

The Best of the Week (Oct 31): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (October 31 - November 07). This week's topics include getting started with Hadoop and MapReduce, data structural integrity, Big Data goals, prediction intervals, JSF versus JSP with CRUD applications.

Ajitesh Kumar11/09/14
7186 views
0 replies

Learn R: Hello World with R

This article represents some of the basic concepts you need to understand in order to write a Hello world using the R programming language.

Linda Gimmeson11/08/14
879 views
0 replies

Big Data is Changing the Real Estate Landscape

Big data has a large role to play in the real estate industry.

John Cook11/07/14
4393 views
0 replies

How well does sample range estimate range?

I’ve been doing some work with Focused Objective lately, and today the following question came up in our discussion. If you’re sampling from a uniform distribution, how many samples do you need before your sample range has an even chance of covering 90% of the population range?

Ana-maria Mihalceanu11/06/14
2075 views
0 replies

JSF Versus JSP, Which One Fits Your CRUD Application Needs? (Part 1)

We make decisions every day; everything we say and do is the result of a decision, whether we make it consciously or not. No matter how big or small is the choice, there's no (easy) formula for making the right decision.

Mike Bushong11/05/14
1308 views
0 replies

Network Engineers, Pay Attention to Big Data

Don’t be afraid of these new applications. They are coming whether you like it or not. Embrace them, understand them as best you can. Then sit back and think about what the network can do for them. You have an ability to significantly impact their ability to perform.

David Mai11/05/14
1314 views
0 replies

Salesforce Enters BI & Analytics Market. Will They Create a Wave or Just a Ripple?

Salesforce the world’s largest enterprise cloud computing company has recently unveiled “Wave”, their new enterprise business intelligence solution.

Pavithra Gunasekara11/04/14
5813 views
0 replies

Getting Started with Hadoop MapReduce

Hadoop MapReduce framework provides a way to process large data, in parallel, on large clusters of commodity hardware. An edit to an earlier version.

Mike Bushong11/04/14
3013 views
0 replies

Making the World a Better Place with Big Data

While there is certainly much feel-good hyperbole about the “making the world a better place” nature of big data, that is more than offset with actual real-world details of how data is being used to solve more day-to-day business problems.

Edmund Kirwan11/03/14
2154 views
0 replies

The Blighttown Corollary

Can an image capture an entire system's structural integrity? Can we tell from a graphic whether a system is well-structured? The Blighttown corollary highlights the importance of a good package structure, as this structure will probably constrain the quality of the entire system's structure.

Rob J Hyndman11/03/14
1984 views
0 replies

Prediction intervals too narrow

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty.

Benjamin Ball11/01/14
3807 views
0 replies

The Best of the Week (Oct 24): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (October 24 - October 31). This week's topics include Twitter data analysis, running Hadoop on Ubuntu, information retrieval with Apache Lucene, a method for data visualization, and removing references with R.