Of all the myriad of terms that the tech industry throws around at the moment, none is as often subverted for marketing spin as “big data”. So much so that few people can actually agree on what big data is. For me, I’ll revert to Wikipedia and its definition which states;
big data is a collection of data sets so large and complex that it becomes awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to “spot business trends, determine quality of research, prevent diseases,link legal citations, combat crime, and determine real-time roadway traffic conditions.”
So, with the definition out of the way, on to the bigger question. Is big data simply a marketing term or is it something that’s actually being looked at within enterprise? A new survey out from RainStor indicates that Big Data is indeed being taken seriously. Here’s a summary of findings.
- The promise of big data and its value to the organization – 75.5% of respondents agree that managing their Big Data and making it available across the enterprise was important to improve overall business value.
- Velocity and Variety of Data continue to present some of the biggest challenges – the survey reveals that the speed of data creation (velocity) and increase in data types (variety) are a main challenge in addition to the ability to provide analytics against this data getting 37% of respondents vote.
- New Skills are needed – lack of relevant skills in newer technologies such as Hadoop was a prominent theme whereas standard SQL and SQL statements still appear to be the “enterprise standard” when running queries and analysis against existing data warehouses.
It’s clear that the analysis and extraction of insight from the ever-increasing quantity of data available to an organization is, and will continue to be increasingly, critical. That said, it’s also clear that we’re at a very early stage in the process, the closest most organizations get to “big data” is running traditional data warehouse BI operations. There is however a convergence of infrastructure availability (powered by the cloud) alongside ever increasing quantities of data sets (from social and other streams). Combine these two trends with some new ways of analyzing unstructured data and you have a space that is going to continue growing. The key to ensuring that organizations can actually use this data for positive outcomes however is to simplify the analysis of this data. With the ever increasing demand for data scientists, it will increasingly be important to automate the identification of what is, and is not, valuable data. Other than the largest enterprises, organizations cannot afford to invest in in-house data scientists and it is for this reason that both traditional approaches to querying data and a new generation of automated data extraction tools will come to the fore in the next few years. It is worth nothing that nearly 90% of respondent are still using SQL query and SQL statements in their data warehouse environment – an indication that for all the hype, Hadoop and MapReduce access technologies are only currently available to the most advanced of organization.