Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Don’t Be Tricked by Unstructured Data Analytics Technology

DZone's Guide to

Don’t Be Tricked by Unstructured Data Analytics Technology

Don't be fooled. The idea of true unstructured data analytics technology is a myth. The real issue lies in structured data.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

As big data continues to gain momentum, unstructured data analytics technology is following. It's said that 80% of all data in an enterprise is unstructured, which is not surprising when we consider the large amounts of audio and video data. With huge amounts of data at hand, some analyzing technique is required.

But how can you be sure you're not being misled by your unstructured data analytics technology?

There Is No Universal Unstructured Data Computing Technology

Unstructured data involves a variety of formats such as audio data, images, texts, web data, office documents, and device logs. Each data format needs a specific processing technique, such as speech recognition, image comparison, full-text search, and graphic computation. There isn’t a technique to analyze all forms of unstructured data. Similarly, there’s no reason to replace the image comparison technique with the speech recognition technique, or substitute full-text search with graphic computation.

A software vendor who specializes in a certain technology will certainly advertise its domain, like facial recognition technology or text mining, instead of just claiming that it is an expert that doesn't offer anything special. Obviously, it's easier to find target customers and market with a highly professional product. A vendor peddling unstructured data analytics who fails to offer a professional product is a jack of all trades but a master of none.

There Is a Universal Unstructured Data Storage Technology

There are indeed certain technological fields where unstructured data analytics dominates. But in other fields, the users require properly stored unstructured data. Generally, unstructured data analytics technology isn’t a universal demand. While there isn’t an all-encompassing unstructured data computing and analytics technology, universal storage and management technology that allows for searching, adding, and deleting data does exist. Because unstructured data occupies a much larger space than structured data it needs a different storage technique.

Unless the data is particularly huge or high concurrency is required for to search the data, most NFS systems (i.e. HDFS) are capable of meeting the demands of data storage and access. Yet it seems a vendor is less technological if it sells no more than unstructured data storage and management services. So, many software vendors strive to advertise their analytics capabilities even though they have no substantial services to offer. In contrast, a real storage service provider who offers high-capacity and high-performance data access focuses on promoting storage infrastructure rather than data analytics solutions.

Structured Data Analytics Is the Underlying Rock

The collection of unstructured data is often accompanied by the collection of structured data, such as the time, type, duration, etc. of a piece of audio or video. Sometimes, unstructured data will become structured data after it's processed. A web log, for instance, may be split and generate visitor IP addresses, access time, key words, and other attributes. The so-called unstructured data issue is, in essence, a structured one. And there are already some mature standard structured data analytics technologies in the form of relational databases.

But to grab users’ attention, vendors formulated the concept of unstructured data analytics to disguise the underlying rock: the structured data problem.

That is why users, on the demand side, need to understand what treatment their data requires. If the data simply requires proper storage, then an open-source NFS system is sufficient. If high-performance access is needed, go to a storage vendor. If generated structured data needs to be analyzed, look to database processing. If data needs a specialized type of processing, find a professional vendor and technology in the specialized area. In other words, try to be exact about your data processing type.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
unstructured data ,structured data ,data analytics ,big data

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}