Big Data/Analytics Zone is brought to you in partnership with:
  • submit to reddit
Mark Needham11/21/14
0 replies

R: Joining multiple data frames

I’ve been looking through the code from Martin Eastwood’s excellent talk ‘Predicting Football Using R‘ and was intrigued by the code which reshaped the data into that expected by glm. I really like dplyr’s pipelining function so I thought I’d try and translate Martin’s code to use that and other dplyr functions.

Rob J Hyndman11/20/14
0 replies

Seasonal Periods

The “fre­quency” is the num­ber of obser­va­tions per sea­son. This is the oppo­site of the def­i­n­i­tion of fre­quency in physics, or in Fourier analy­sis, where “period” is the length of the cycle, and “fre­quency” is the inverse of period. When using the ts() func­tion in R, the fol­low­ing choices should be used.

Ricky Ho11/19/14
0 replies

The Common Data Science Project Flow

While working across multiple data science projects, I observed a similar pattern across a group of strategic data science projects where a common methodology can be used. In this post, I want to sketch this methodology at a high level.

Hüseyin Akdoğan11/16/14
0 replies

Introduction to Elasticsearch Snapshot and Restore Module

When working with large amounts of data, backup and--if necessary--restoring is an important requirement. Elasticsearch has a snapshot and restore module that addresses this need.

Benjamin Ball11/15/14
0 replies

The Best of the Week (Nov 7): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (November 07 - November 14). This week's topics include learning R, sample range estimates, extracting datasets in Excel, R class for wildfire scientists, and converting a named vector to a data frame in R.

Ravi Namboori11/15/14
0 replies

Big Data For Dummies

This article endeavors to explain how Big Data will bring about changes in information processing in the IT world. Its aim is to reach out to people seeking clarity on this concept, which has been surrounded by so much hype.

Mark Needham11/14/14
1 replies

R: Converting a named vector to a data frame

I’ve been playing around with igraph’spage rank function to see who the most central nodes in the London NoSQL scene are and I wanted to put the result in a data frame to make the data easier to work with.

Ravi Namboori11/14/14
0 replies

Tapping Big Data To Your Own Advantage

This is SAS's view on big data. The article discusses how big data can be used to take better decisions, cut costs, and gain advantages.

Arthur Charpentier11/13/14
0 replies

Extracting Datasets from Excel Files in a Zipped Folder

The title of the post is a bit long, but that’s the problem I was facing this morning: importing datasets from files, online. I mean, it was not a “problem”, more a challenge (I should be able to do it in R, directly)

Jonathan Callahan11/12/14
0 replies

Using R: R Class for Wildfire Scientists

Mazama Science has just finished creating class materials on using R for the AirFire team at the USFS Pacific Wildland Fire Sciences Lab in Seattle, Washington. Autodidacts new to R should take about 20-30 hrs to complete the course.

Benjamin Ball11/10/14
0 replies

The Best of the Week (Oct 31): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (October 31 - November 07). This week's topics include getting started with Hadoop and MapReduce, data structural integrity, Big Data goals, prediction intervals, JSF versus JSP with CRUD applications.

Ajitesh Kumar11/09/14
0 replies

Learn R: Hello World with R

This article represents some of the basic concepts you need to understand in order to write a Hello world using the R programming language.

Linda Gimmeson11/08/14
0 replies

Big Data is Changing the Real Estate Landscape

Big data has a large role to play in the real estate industry.

John Cook11/07/14
0 replies

How well does sample range estimate range?

I’ve been doing some work with Focused Objective lately, and today the following question came up in our discussion. If you’re sampling from a uniform distribution, how many samples do you need before your sample range has an even chance of covering 90% of the population range?

Ana-maria Mihalceanu11/06/14
0 replies

JSF Versus JSP, Which One Fits Your CRUD Application Needs? (Part 1)

We make decisions every day; everything we say and do is the result of a decision, whether we make it consciously or not. No matter how big or small is the choice, there's no (easy) formula for making the right decision.

Mike Bushong11/05/14
0 replies

Network Engineers, Pay Attention to Big Data

Don’t be afraid of these new applications. They are coming whether you like it or not. Embrace them, understand them as best you can. Then sit back and think about what the network can do for them. You have an ability to significantly impact their ability to perform.

David Mai11/05/14
0 replies

Salesforce Enters BI & Analytics Market. Will They Create a Wave or Just a Ripple?

Salesforce the world’s largest enterprise cloud computing company has recently unveiled “Wave”, their new enterprise business intelligence solution.

Pavithra Gunasekara11/04/14
0 replies

Getting Started with Hadoop MapReduce

Hadoop MapReduce framework provides a way to process large data, in parallel, on large clusters of commodity hardware. An edit to an earlier version.

Mike Bushong11/04/14
0 replies

Making the World a Better Place with Big Data

While there is certainly much feel-good hyperbole about the “making the world a better place” nature of big data, that is more than offset with actual real-world details of how data is being used to solve more day-to-day business problems.

Edmund Kirwan11/03/14
0 replies

The Blighttown Corollary

Can an image capture an entire system's structural integrity? Can we tell from a graphic whether a system is well-structured? The Blighttown corollary highlights the importance of a good package structure, as this structure will probably constrain the quality of the entire system's structure.

Rob J Hyndman11/03/14
0 replies

Prediction intervals too narrow

Almost all pre­dic­tion inter­vals from time series mod­els are too nar­row. This is a well-​​known phe­nom­e­non and arises because they do not account for all sources of uncer­tainty. When we pro­duce pre­dic­tion inter­vals for time series mod­els, we gen­er­ally only take into account the first of these sources of uncer­tainty.

Benjamin Ball11/01/14
0 replies

The Best of the Week (Oct 24): Big Data Zone

Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (October 24 - October 31). This week's topics include Twitter data analysis, running Hadoop on Ubuntu, information retrieval with Apache Lucene, a method for data visualization, and removing references with R.

Mark Needham10/29/14
0 replies

Python: Converting a date string to timestamp

I’ve been playing around with Python over the last few days while cleaning up a data set and one thing I wanted to do was translate date strings into a timestamp.

Rob J Hyndman10/28/14
0 replies

HTS with Regressors

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints.

Pavithra Gunasekara10/27/14
0 replies

Getting Hadoop Up and Running on Ubuntu

In this post my aim is to get Hadoop up and running on a Ubuntu host using Local (Standalone) Mode and on Pseudo-Distributed Mode.