Overview of Big Data for Beginners
An introductory discussion of the field of big data, covering terms such as velocity, variety, and volume.
Join the DZone community and get the full member experience.Join For Free
Five Vs of Big Data
You may be wondering why I’m starting with the Five V’s of Big Data before even explaining What Big Data is. But, trust me on this one, it will make more sense to explain what Big Data is after learning the Five V’s of Big Data.
When we deal with data, especially Big Data, there are “Five Vs” you should know.
The first “V” is Volume – this refers to the vast amount of data gets generated every second in the current information age.
The quantity of data which gets generated every year is exponentially higher compared to the previous years. By the year 2025, it’s expected that the total volume of data which got created will be ten times more than the currently accumulated data volume.
In a traditional database system, the data is measured in Gigabytes (GB) and Terabytes (TB), but Big Data is measured in Petabytes (PB) and Exabytes. The data is so huge, and it’s impossible to manage it using a traditional database system.
The primary reason for the increase in the volume of data gets generated is due to the rise in the number of devices like mobile phones, surveillance camera, sensors or any other IoT device. Another important source of data is the web access logs. Every search we do on Google and every website we visit in the world wide web generates data.
The second “V” is Velocity – this refers to the speed at which the data moves and the rate at which new data gets generated.
Not only has the volume of generated data that increased, but the speed at which this data arrives and moves has also improved drastically.
The primary reason for this is the significant increase in the speed of communication of data through the internet. Also, more and more people are getting access to high-speed internet every day, this automatically increases the rate at which the new data gets generated and moves.
The third “V” is Variety – this refers to the different types of data which gets generated and used.
Since the number of types of devices which generates data increased, the type of data that each of these devices produces also increased drastically. It’s no longer the traditional structured data, or text message gets created, it’s also various unstructured data or images or videos and much more.
The fourth “V” is Veracity – this refers to the messiness or integrity of the data.
The data which we obtain from its source does not always need to be accurate. The quality of the data received will sometimes be of low standards due to any technical issues or human error or intentional malicious manipulation.
The fifth and the final “V” is Value – this refers to the value we get by analyzing the data.
There is no point in handling Big Data if we are not going to get any business value out of it. So, it’s essential to analyze and get value from the data which gets generated by various sources.
What Is Big Data?
It’s always ambiguous when it comes to the definition of Big Data. To be more precise, it’s difficult to say when data becomes big data. You may think if the size of the data is enormous, then it’s Big Data, but this is not entirely true. But, now that we know the Five Vs of Big Data, let’s go ahead and define what Big Data is.
“Big Data” is a term used to refer an extremely large, high-speed, vastly diverse and complex data, which can be analyzed to derive business value.
The data can be both structured and unstructured which can’t be managed with our traditional database management systems, and it should be handled by developing a highly scalable, maintainable, and fault-tolerant data systems.
Published at DZone with permission of Manoj G T. See the original article here.
Opinions expressed by DZone contributors are their own.