The Big Data Challenge
The Big Data Challenge
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Big Data analytics comprises the below steps:
- Data Collection - collect un-structured and structured data from variety of conventional and non conventional sources including machine sensors.
- Data Storage - store data in robust, distributed, scalable storage based on commodity hardware with replication copies.
- Descriptive Analytics - summarize data and develop data visualization.
- Predictive Analytics - develop model using available data using supervised learning algorithms.
- Prescriptive Analytics - develop story for leveraging predictions.
We still have not discussed skeptical view which will help us in refining Big data steps. I will discuss few challenges in leveraging true value of Big Data. Here are some scenarios and remedies.
- Lack of identification of data sources or hidden data sources: there might be few data sources hidden in data collection step. Big data does not limit number of data sources and encourages to gather all data from all available sources. As a thumb rule all data need to be collected for solving Big data problem. With this understanding, we need to ensure adequate security for All Data approach. Multiple teams can contribute in data collection.
- Data Silos can be generated in Data Storage step with concern for data security, lack of unified data service layer and unified data modeling. We can address this challenge with unified data model with scope to define business entities, unified service layer and security implementation in form of authentication and authorization. There is emerging concept of Data Lake which needs data to be stored based on agreed schema between producer and consumer.
- Analytics had been traditionally associated with smaller data sets and was performed in OLAP mode. It will be hard to replace existing Analytics/BI tools or enhance them unless we convince stake holders about advantages of Big Data with reference to analytics - real time analytics and larger data sets parallel processing capability. Algorithms are getting ported to Big Data based software packages which is very exciting. Big Data technologies will be leveraging existing analytics platforms - R, Python, SAS and provide unified platform for analytics. Big Data talent too has to merge with analytics skill to deliver descriptive, predictive and prescriptive analytics.
Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.
Opinions expressed by DZone contributors are their own.