There is a lot going on behind the scenes.
Hadoop + Impala will give us an easy way to analyze large datasets using SQL with the ability to scale even on the old hardware.
Computation outside of a database is an alternative to expanding storage capacity.
There are several other blogs on forecasting that readers might be interested in. Here are seven worth following.
Here is a quick intro screen cast on Big Data and creating map reduce jobs in C# to distribute the processing of large volumes of data, leveraging Microsoft Azure HDInsight / Hadoop On Azure.
This installment of Arthur Charpentier's regular collection of data science-related links includes 9 problems with big data, the "Wonk Bubble" and big data journalism, advice from Cathy O'Neil about putting your trust in data analysis, and how big data is the next frontier for innovation, competition, and productivity.
Machine learning is a "Field of study that gives computers the ability to learn without being explicitly programmed".
There are many instances where we will have to write custom queries with hibernate.
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Apr. 11 to Apr. 17). This week's best include a guide to real-time big data, an evaluation of big data platforms, and the dark sides of Lucene.
Teradata unveiled a number of enhancements to its core data management offerings. One announcement stood out: the launch of QueryGrid, a tool designed to orchestrate the execution of analytic processing across parallel databases.
The author has been using Lucene for the past six or seven years, and after his last post, he thought it would be a good idea to talk a bit about the kind of things that it isn't doing well.
Typing tables in LaTeX can get messy, but there are some good tools to simplify the process.
So, whatever the test, we always reject the assumption that there is a seasonal unit root. Which does not mean that we can not have a strong cycle! Actually, the series is almost periodic. But there is no unit root!
The hypothesis of this theorem is that the underlying distribution has a mean. Lets see where things break down if the distribution does not have a mean.
The database plugin in IntelliJ IDEA is a useful tool to work with data in databases. As long as we got a JDBC driver to connect to the database we can configure a data source
SQL as an interface to big data operations is desirable for the same reasons the author found it useful, but it also introduces some performance implications that are not suited to traditional MapReduce-style jobs which tend to have completion times in the tens of minutes to hours rather than seconds.
Infochimps has moved in a different direction, focusing far more attention upon the tools and services required to work with data, less upon offering a place for customers to find data. We touch upon Hadoop’s role within the growing big data ecosystem, asking if it’s as important as its backers tend to claim.
Get down with R and start visualizing your data in a whole new way!
In most of these applications, you have to deal with evented data which comes in “in real-time”. Data is constantly changing and you usually want to consider the data over a certain time frame (“page views in the last hour”), instead of just taking all of the past data into account.
Make sure you didn't miss anything with this list of the Best of the Week in the Big Data Zone (Apr. 04 to Apr. 10). This week's best include a discussion of interoperability in the Internet of Things (IoT) discipline, a look at Apache Spark, and an adventure into Lucene's indexing process.
For a look at what's been happening outside of the Big Data Zone, we've assembled a collection of links including the 30 best tools for data visualization, different perspectives on Hadoop and related tools, New Relic's Splunk-style Analytics, and the role of big data in the rise of the Internet of Things (IoT).
In this post the author summarizes their notes from a conference that included the following topics: Is Big Data a Big Hype? How do you make sense out of your Big Data? Do we need a new role for Chief Data Officer? What is the business value behind Big Data? Is there a good visualization tool for Big Data?
The solution was to “build the house of data” and for the time being, that means using Hadoop for what it calls internally, “hadumping.”
What is going on here is that the commentators are assuming we live in a noise-free world. However, the world is noisy — real data are subject to random fluctuations, and are often also measured inaccurately. So to interpret every little fluctuation is silly and misleading.
How do you do sorting on a field value? The answer is, not easily.