This is the first blog post in a series where I hope to raise a bit above the technical stuff and instead focus on how we can put Big Data to effective use. I ran a SkySQL Webinar on the subject recently that you might also want to watch, and a recording is available here: http://bit.ly/17TTQnJ
Yes, so what? Why do you need or want all that data? All data you need from your customers you have in your Data Warehouse, and all data you need on the market you are in, you can get from some analyst? Right?
Well, yes, that is one source of data, but there is more to it than that. The deal with Data is that once you have enough of it, you can start to see things you haven't seen before. Trend analysis is only relevant when you have enough data, and the more you have, the more accurate it gets.Big Data is different from the data you already have in that it is Bigger, hence the name, but not only that. Big Data also contains much more diverse types of data, such as images, sound, metadata and video. Also, Big Data has much more new data coming in and is constantly shifting. Research says that each day some25 quintillion bytes of data is created, this is 25 000 000 000 000 000 000 bytes, if you ask (which is some 25 000 petabytes or 25 000 000 terabytes). And yes, that is every day. (and yes, this is using 1000 bytes per kb, not 1024 per Kb).
As I already said, what is interesting with such huge amounts of data is that once the volumes are high enough, is that you can infer things that you couldn't with smaller or more focused data. You can infer changes that you couldn't before and in some sense make qualified predictions on what will happen in the world. Does this sound like rocket science? Well, it shouldn't and truth is that we have been doing this in at least one field for a very long time, since the 1950's or so, and this was one of the first application for computers. And no, I'm not talking about Angry Birds here.
What I am talking about is weather forecasting. Using knowledge about how winds blow, temperatures, geographies and statistics, we can reasonably well predict how the weather will be. As we all know, these forecasts aren't always right, but even when they go wrong, we get to know why they went wrong. The way these predictions work is to combine large amounts of data with experience and hard facts on how the weather behaves, and the data isn't directly related to the area where we try to predict the weather either. We can do very little to influence the weather, except of course plan a picnic which is sure to create thunderstorms.
In the case of, say, sales of some consumer product, we are actually able to influence this more than we can influence the weather. And if we then add our knowledge of our market and the dynamics of it and combine that with truckloads of related and semi-related data, why shouldn't we be able to do some predictions. Not in the sense of knowing exactly what will happen in the future, but at least have an idea of what is the most likely thing to happen and have an idea of the likelihood that this will be so. Which is how weather forecasts work.
But this isn't all there is to it. Let's pop back to weather forecasting for a second. The analysis done on weather systems is a lot more complex than that done in most data warehouses, there is more to this than some summaries and averages. Also, the way this is presented: Using a way with an overlay of symbols (a Sun, a Cloud, some poor soul planning a picnic) is different from how we are used to see trend data in our data houses.
- We need ways of dealing with large amount of fast moving data - Big Data
- We need new, better and more specialized analysis - Big Analytics
- We need new ways to view data - Visualizations
I'll be back soon with something more specific on this subject, so don't touch that dial!