A Lap Around Big Data with Microsoft HDInsight
Join the DZone community and get the full member experience.Join For Free
big data synonyms with three v s : volume , velocity & variety . even with traditional e-commerce system to modern social networks all systems data conservation is dependent on this platform. let's check a scenario of modern e-commerce analytics after integration with big data.
- big data platform typically works by storing data first into clusters , then process the data through mapreduce workflows which executes by mapping the input data through independent chunks processed by appropriate algorithms, the output from map phase then moves to shuffle/sorting phase & finally the output from shuffle phase comes to reduce phase as input .
- lets check a typical big data mapreduce workflow .
- microsoft's bigdata platform works exactly same way as a collaborative solution with horton works named as microsoft hdinsight . which typically simplifies the solution of running complex batch scripts. lets cover a little insight of hdinsight/hadoop ecosystem.
- microsoft's big data platform unveils solutions from storing data into hdfs to query processing on hive up to implementing business intelligence analytics on excel powerpivot, powerpivot, ssas & ssrs solutions .
- storing data into hdfs : petabytes to zetabytes of data to be stored in hdfs clusters by means of name node followed by data nodes, in azure hdinsight each data node is integrated with a worker roles & compute cluster. alternatively , you can leverage the solutions using azure blob storage utilizing front end(attaches oauth/security layer for authentication), partition layer: for mapping with azure queue, table & blob storages , stream layer : 3 layer ha for scaled out data stream .
- in order to programming on hdinsight , you can opt for java, c#, f#, .net, .js api, linq to hive apis which leverages to code on hadoop ecosystems including hadoop pig, hive, mahout, cascading, pegasus .
Opinions expressed by DZone contributors are their own.
A Data-Driven Approach to Application Modernization
The Role of AI and Programming in the Gaming Industry: A Look Beyond the Tables
Five Java Books Beginners and Professionals Should Read
Turbocharge Ab Initio ETL Pipelines: Simple Tweaks for Maximum Performance Boost