How to Leverage Big Data like Google?
How to Leverage Big Data like Google?
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Recently, I read Why Big Data Projects Fail by Stephen Brobst at: http://data-informed.com/why-big-data-projects-fail. I can’t agree more with his opinions which exposed the problem I’ve been worried about. In this article, I am going to further discuss this topic to remind the enterprises to beware of falling into such pitfall of failure.
Let’s have a look on a positive example. As a successful enterprise in leveraging big data, how does Google make use of the big data?
1. Collect the row data, capture the contents of each website, e-mail, or Cookie, and extract the key information.
2. Create the complex syndetic indexfor this information. Needless to say, the advertisement-related index must be also created.
3. Store these indices and corresponding contents in the distributed servers.
4. When users are browsing website and searching or viewing e-mails, Google will arrange their requests to go through a complex translation procedure, and several index entries will be located accordingly.
5. Retrieve data from server according to the index, and return the search result or advertisement.
Of all those above-mentioned contents, what contents are related to Hadoop architecture? They are the No. 3 and the No. 5 items. That is, data storing and data retrieving.
Can the No.3 and the No. 5 items be implemented easily? Yes. The alike Hadoop solution is of good expandability and low purchase cost.
Can I operate like Google once implemented the No.3 and No.5 items? No, you can’t because you have not implemented the key items of No.2 and No.4 yet.
What are the items of No.2 and No.4? They are business analysis algorithm. This is the algorithm designed by business experts meticulouslyon the basis of data, business knowledge, and market trends, as a core competency and business decision making procedure for many enterprises. This is the “Value” component of the 4V Theory.
Why big data will fall into the pitfall of failure? It is because the current big data only provides the solution for data storage and query. It lacks a good solution for business analysis to enhance the competiveness, which is the most crucial. There is a great gap in-between. In facts, the current big data is the tool for IT experts. They are able to implement the MapReduce functions with C++ or Java, but unable to reach the ultimate goal – provide the valuable business algorithms.
To avoid the pitfall of failure, enterprises must use the advanced analysis tool that is business-expert-oriented, regardless of user’s technical background, and capable to convert the business logics to the business algorithm rapidly, intuitively, and conveniently. How about NoSQL or SQL? Neither of them is ideal. They are for the IT personnel only, owing to their requirements on the strong technical background, complex operations, and comparatively weak computation capability.
What are the ideal tools for business experts? From the TCO perspective, I would rather choose the lightweight R language and esProc Desktop than pin my hopes on the heavyweight Teradata Aster and SAP Visual Intelligence. Especially esProc, this business computation desktop tool is designed for business experts, as its syntax is easy to use and understand with lower technical requirements. The scripts are aligned automatically, allowing users to observe the results of each step clearly and visually. The results can be referenced directly through the names of the cells, enabling users to compute freely according to business logic.
Opinions expressed by DZone contributors are their own.