Over a million developers have joined DZone.

The Big Data Challenge

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Big Data analytics comprises the below steps:

  1. Data Collection - collect un-structured and structured data from variety of conventional and non conventional sources including machine sensors.
  2. Data Storage - store data in robust, distributed, scalable storage  based on commodity hardware with replication copies.
  3. Descriptive Analytics - summarize data and develop data visualization.
  4. Predictive Analytics - develop model using available data using supervised learning algorithms.
  5. Prescriptive Analytics -  develop story for leveraging predictions.


We still have not discussed skeptical view which will help us in refining Big data steps. I will discuss few challenges in leveraging  true value of Big Data. Here are some scenarios and remedies. 

  1. Lack of identification of data sources or hidden data sources: there might be few data sources hidden in data collection step. Big data does not limit number of data sources and encourages to gather all data from all available sources. As a thumb rule all data need to be collected for solving Big data problem. With this understanding, we need to ensure adequate  security for All Data approach. Multiple teams can contribute in data collection. 
  2. Data Silos can be generated in Data Storage step with concern for data security, lack of unified data   service layer and   unified data modeling. We can address this challenge with unified data model with scope to define business entities, unified service layer and security implementation in form of authentication and authorization. There is emerging concept of Data Lake which needs data to be stored based on agreed schema between producer and consumer. 
  3. Analytics had been traditionally associated with smaller data sets and was performed in OLAP mode. It will be hard to replace existing Analytics/BI tools or enhance them unless we convince stake holders about advantages of Big Data with reference to analytics - real time analytics and larger data sets parallel processing capability. Algorithms are getting ported to Big Data based software packages which is very exciting. Big Data technologies will be leveraging existing analytics platforms - R, Python, SAS and provide unified platform for analytics. Big Data talent too has to merge with analytics skill to deliver descriptive, predictive and prescriptive analytics.

Conclusion

Big Data analytics is now more towards accurately defining data, uniform handling and developing data driven smart products.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
bigdata ,big data

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}