Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Don’t Be Tricked by Unstructured Data Analytics Technology

DZone's Guide to

Don’t Be Tricked by Unstructured Data Analytics Technology

Don't be fooled. The idea of true unstructured data analytics technology is a myth. The real issue lies in structured data.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

As big data continues to gain momentum, unstructured data analytics technology is following. It's said that 80% of all data in an enterprise is unstructured, which is not surprising when we consider the large amounts of audio and video data. With huge amounts of data at hand, some analyzing technique is required.

But how can you be sure you're not being misled by your unstructured data analytics technology?

There Is No Universal Unstructured Data Computing Technology

Unstructured data involves a variety of formats such as audio data, images, texts, web data, office documents, and device logs. Each data format needs a specific processing technique, such as speech recognition, image comparison, full-text search, and graphic computation. There isn’t a technique to analyze all forms of unstructured data. Similarly, there’s no reason to replace the image comparison technique with the speech recognition technique, or substitute full-text search with graphic computation.

A software vendor who specializes in a certain technology will certainly advertise its domain, like facial recognition technology or text mining, instead of just claiming that it is an expert that doesn't offer anything special. Obviously, it's easier to find target customers and market with a highly professional product. A vendor peddling unstructured data analytics who fails to offer a professional product is a jack of all trades but a master of none.

There Is a Universal Unstructured Data Storage Technology

There are indeed certain technological fields where unstructured data analytics dominates. But in other fields, the users require properly stored unstructured data. Generally, unstructured data analytics technology isn’t a universal demand. While there isn’t an all-encompassing unstructured data computing and analytics technology, universal storage and management technology that allows for searching, adding, and deleting data does exist. Because unstructured data occupies a much larger space than structured data it needs a different storage technique.

Unless the data is particularly huge or high concurrency is required for to search the data, most NFS systems (i.e. HDFS) are capable of meeting the demands of data storage and access. Yet it seems a vendor is less technological if it sells no more than unstructured data storage and management services. So, many software vendors strive to advertise their analytics capabilities even though they have no substantial services to offer. In contrast, a real storage service provider who offers high-capacity and high-performance data access focuses on promoting storage infrastructure rather than data analytics solutions.

Structured Data Analytics Is the Underlying Rock

The collection of unstructured data is often accompanied by the collection of structured data, such as the time, type, duration, etc. of a piece of audio or video. Sometimes, unstructured data will become structured data after it's processed. A web log, for instance, may be split and generate visitor IP addresses, access time, key words, and other attributes. The so-called unstructured data issue is, in essence, a structured one. And there are already some mature standard structured data analytics technologies in the form of relational databases.

But to grab users’ attention, vendors formulated the concept of unstructured data analytics to disguise the underlying rock: the structured data problem.

That is why users, on the demand side, need to understand what treatment their data requires. If the data simply requires proper storage, then an open-source NFS system is sufficient. If high-performance access is needed, go to a storage vendor. If generated structured data needs to be analyzed, look to database processing. If data needs a specialized type of processing, find a professional vendor and technology in the specialized area. In other words, try to be exact about your data processing type.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
unstructured data ,structured data ,data analytics ,big data

Published at DZone with permission of Buxing Jiang. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}