“Big data is any data that when you pile it up reaches into the Cloud.” This was the opening statement for Jack Norris, CMO of MapR at the Cloud Connect Conference in Santa Clara today. He was paraphrasing the analysts but it was the ideal frame up for the Big Data Track at a Cloud conference.
A new paradigm
According to Norris, big data and Cloud are a paradigm shift and an architectural change that involves putting data and computing power together as a massive processing unit.
Norris drilled into this by describing the challenge facing today’s enterprise: Separating data and computing as data grows is taking longer and longer. More and more, organizations need to
- Process more quickly – Things are moving faster every day and competitive businesses need to keep up
- Combine multiple data sources – Organizations need to blend data to gain insights. That data can’t be stored in one place and can even be outside the organization (such as in the cloud)
- Expand analysis – There are limits on traditional systems and organizations need to go beyond the traditional SQL-based analysis of the past
Hadoop in the Cloud
The most interesting part of of the big data story for this setting was how Hadoop and big data are used in the Cloud. For many companies going this direction, Hadoop in the cloud is a very flexible infrastructure. While we often hear about performance questions with Cloud, Norris brought up the current MinuteSort record of 1.5 TB, set by Google working with MapR as proof that Cloud performance is less and less of a question.
It takes more data
Where Norris made some of his strongest points came with his contention that greater data is now filling in the gaps where we used to use complex algorithms. Many things, like human behavior, have been deemed too complex to understand completely. Norris pointed out using uses cases like fraud detection, flu trends and the Netflix recommendation engine to show that even the most complex behavior becomes predictable when enough data comes to the table.
If this concept is true, than the ability to add additional data in cost-effective ways becomes one of the most important enterprise strategies available. It’s easy to see where the Cloud plays a critical role in providing a place for that data to sought, reached, and incorporated efficiently.
Norris provided the following examples of where Hadoop is being used in the Cloud:
- Targeted advertising/clickstream analysis
- Security for anti-virus, fraud detection, and image recognition
- Pattern matching/recommendations
- Data warehousing/BI
- Bio-informatics like genome analysis
- Financial simulation like Monte Carlo
- File processing like image resizing and video encoding
- Web indexing
Big Data lessons
This was a very comprehensive talk and drew a sizable crowd for the last day of the event. Norris closed with “Big Data Lessons from the Cloud”:
- Big Data requires a new approach
- Hadoop is a paradigm shift
- Easy to get started with Hadoop in the Cloud
- Scale clusters up and down in the Cloud
- Only pay for what you use
- Expand data for analysis
- Combine data sources
- New application from new data source
- New analytics
- Wide variety of applications appropriate for Hadoop