Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

A Lap Around Big Data with Microsoft HDInsight

DZone's Guide to

A Lap Around Big Data with Microsoft HDInsight

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

Big Data synonyms with three V s :  Volume , Velocity & Variety. Even with traditional e-commerce system to modern social networks all systems data conservation is dependent on this platform. Let's check a scenario of modern e-commerce analytics after integration with Big Data.

bigdata

ecommerce

  • Big Data platform typically works by storing data first into clusters , then process the data through MapReduce workflows which executes by Mapping the input data through independent chunks processed by appropriate algorithms, the output from Map phase then moves to Shuffle/Sorting phase & finally the output from Shuffle phase comes to Reduce phase as input.
  • Lets check a typical Big Data MapReduce workflow.

storedata

processdata

MR

  • Microsoft's BigData platform works exactly same way as a collaborative solution with Horton Works named as Microsoft HDInsight. Which typically simplifies the solution of running complex batch scripts. Lets cover a little insight of HDInsight/Hadoop ecosystem.

HDinsight

  • Microsoft's Big Data platform unveils solutions from storing data into HDFS to query processing on Hive up to implementing Business Intelligence analytics on Excel Powerpivot, Powerpivot, SSAS & SSRS solutions.

MSBigdata

  • Storing data into HDFS : Petabytes to Zetabytes of data to be stored in HDFS clusters by means of Name Node followed by Data Nodes, in Azure HDInsight each Data Node is integrated with a worker roles & compute cluster. Alternatively , you can leverage the solutions using Azure Blob Storage utilizing  Front End(attaches OAuth/Security layer for authentication), Partition layer: for mapping with Azure Queue, table & blob storages , Stream layer : 3 layer HA for scaled out data stream.

HDFS

  • In order to programming on HDInsight , you can opt for Java, C#, F#, .NET, .js API, LINQ to Hive APIs which leverages to code on hadoop ecosystems including hadoop pig, hive, mahout, cascading, pegasus.

hdinsight_API

Microsoft's Hadoop Vision
Microsoft's Hadoop Vision

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}