Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

[DZone Research] The Three Vs of Big Data

DZone's Guide to

[DZone Research] The Three Vs of Big Data

A look at how the volume, variety, and velocity of big data sets impacts the work of software developers and data scientists.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.

Introduction

For the 2018 DZone Guide to Big Data, we surveyed 540 software and data professionals to get their thoughts on various topics surrounding the field of big data and the practice of data science. In this article, we focus on how respondents told us their work is affected by the velocity, volume, and variety of data. 

The Three Vs

The concept of “Big Data” has been a difficult one to define. The sheer amount (volume) of data that is able to be stored in a single hard disk drive, solid state drive, secure digital memory card, etc., continues to improve, and hardware technology has grown fast enough that I still remember buying a computer with a 10 gigabyte hard drive and being told from the salesperson at the electronics store, “you’ll never need more computer storage again.” With new data storage options (cloud and hybrid storage, for example), storing large volumes of data isn’t such a hard thing to overcome, though it still requires some planning to do. The complications added by “Big Data” include dealing not only with data volume but also data variety (how many different types of data you have to deal with) and data velocity (how fast the data is being added).

Beyond that, “Big Data” is complicated by the fact that just storing this data is not enough; to get anything from the data being collected, it not only needs to be stored but also needs to be analyzed. 76% of our survey respondents said they have to deal with large quantities of data (volume), while 46% said they have to work with high-velocity data, and 45% said they have to work with highly variable data.

Each of these “Big Data” categories comes with its own set of challenges. The most challenging data sources for those dealing with high-volume data were files (47%) and server logs (46%), and the most challenging data types were relational (51%) and semi-structured (e.g. JSON, XML; 39%). For those dealing with high-velocity data, server logs and sensors/remote hardware (both 42%) were the most challenging data sources, and semi-structured (36%) and complex data (e.g. graph, hierarchical; 30%) were the most challenging data types. Finally, regarding data variety, the biggest data source challenges came from files (56%). Server logs, sensor/remote hardware data, ERP and other enterprise systems, user-generated data, and supply-chain/logistics/other procurement data all fell between 28% and 32% of responses labeling these tasks as “challenging.” 

Conclusion

The Three Vs are a great way to conceptualize the basic building blocks of big data. But as data volume continues to grow due to the increased adoption of IoT devices, increased access to the internet across the world, and more, the variety and velocity of data will also continue to increase, potentially compounding the issues outlined above. 

What are you preferred methods and tools for working with the Three Vs?  

This article is part of the Key Research Findings from the 2018 DZone Guide to Big Data: Stream Processing, Statistics, and Scalability.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,dzone research ,three vs ,data velocity ,data volume ,data variety

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}