Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Data For Dummies

DZone's Guide to

Big Data For Dummies

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Big data is data, which is so voluminous and complex that it is not capable of being managed with traditional database tools. Although this definition does not do full justice to this concept, it is expected that it will give a reasonable idea of what it is. 

Importantly, to be qualified as big data, the size of the data should be in petabytes or more and its rate of growth should be exponential. 

Big data has caught the fancy of organizations across the board because of its ability to upend traditional business strategies to adapt to the changing times, in the process generating more revenue.

As big data provides access to data in real-time, it can help organizations in improving their cyber security.

This data can also be used for making predictions, which will force organizations to modify their operations accordingly, according to Forrester, an information technology market research firm. 

Big data can also provide insights to companies on consumers buying habits by letting them track and evaluate shopping behavior. 

This concept got a shot in the arm with the proliferation of mobile devices and other technological advances. 

Big data comprises three 'Vs' - Data Volume, Data Variety, and Data Velocity.

Data volume refers to the size of data, as has been already discussed, which is growing at an unbridled pace. In addition, data will be generated from more sources than before which needs to be handled. 

When we talk about data variety, it is the increasing number of formats that big data will need to accommodate. Initially, there were only excel tables, word documents, etc., but now we have PDF files, video streaming, audio and video files. So it can be expected that more such formats will be added in the future, as new applications make their way into the world of IT. 

Data velocity is the capability to analyze huge amounts of data in real-time. 

All these factors present challenges to organizations to manage, analyze, and transfer. A new set of technologies have been and are being, therefore, developed to address them. 

Cloud computing is one of them as it helps organizations to access analysis tools developed specifically for big data.

All the advantages being taken into consideration, big data is being perceived as a threat by some as they aver that it can impinge on people's privacy, besides posing a security threat and a danger of information theft. 

These people's fears are not totally misplaced. To prevent big data from not steeping into such dangerous territories will definitely be a challenge.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}