Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Free Your Data With Data Virtualization

DZone's Guide to

Free Your Data With Data Virtualization

With the rise of data-oriented information systems, the need for simple and efficient access to data is rising. In response, data virtualization tools are increasingly emerging.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

The need for easy access data is exploding! Agile development of mobile and web applications, citizen data scientists, 360-degree vision, Agile BI — we want it now! Whether one is a developer or a business, data becomes the basic "material" for everyone's work. However, despite the data lake wave, access to this data is still not within everyone's reach. Between the complexity of access, heterogeneous models, and the exploded nature of data in IS silos, the data is not yet accessible in self-service. So, how do we free it and thus facilitate its use? 

Data Virtualization: Data's Single Access Point

Faced with this challenge, data virtualization solutions are experiencing real momentum. In 2017, 56% of Forrester respondents said they wanted to implement or had already implemented this type of solution. But first, what is this type of solution? Data virtualization consists of gathering disparate data by virtualizing them in one view and thus creating a unique and consolidated model — all while storing no data! By connecting to data regardless of origin and storage technology and making it easily accessible to anyone through a simple SQL query or REST API, data virtualization enables accelerated data access. You can thus join a SQL database, a NoSQL database, a CSV file, and a REST service, and restore this result by a simple SQL or REST query. One then obtains a self-service of the data accessible to any public — all this without having data that is replicated in these solutions, because none is stored.

Moreover, the data thus exposes gains two precious assets:

  • Performance via query optimizations between different source systems and intelligent cache use. In some cases, we can go from several minutes to a snapshot in terms of response time.

  • A simpler securing of data toward external consumers of source applications centered on a solution that then acts as a single entry point to the data.

This provides a single entry point that is quick to set up and will in many cases replace an MDM or a basic replication via ETL or CDC.

Data Virtualization: New Integration Point That Is Making Its Place

When we discuss this subject, the question that comes up almost every time is, "But how is this different from an ESB?"

Of course, an ESB will be able to expose data from disparate sources. But it will never achieve the same ease of data exposure and performance. So to speak, the ESB is the 4X4 of data integration, whereas the virtualization of data is the Ferrari of data integration..

Indeed, with an ESB, one can expose data, but in a more time-consuming, more complex way and with non-optimized performances. On the other hand, complex business services can be developed and non-data-centric connectors used.

On the other hand, a data virtualization tool will only allow exposing data services in an optimized way, with a dedicated and productive tooling.

We can, therefore, see that these two solutions are ultimately more complementary than competing in terms of use. 

What Solutions Are on the Market?

Many editors are present on the market, from pure players like Denodo to the open-source RedHat editor with JBoss Data Virtualization, and more generalist editors like Tibco Data Virtualization. In any case, everyone is well aware that there are things to do to speed up their communications on this subject. So, there's a good chance that things will move, and the products will move with them. 

What Do Customers Gain?

In terms of customers feedbacks, the return that comes every time is the extreme speed of implementation of the solution. Where an ETL required a week's work in terms of preparation, data virtualization solutions can take as little as an hour! In terms of use cases, 360-degree vision often returns. Indeed, customer data is often scattered in multiple applications, and a data virtualization solution allows to have a unified vision of a customer very quickly. It is also a tool widely used by frontend projects, as Facebook demonstrates very well with its in-house solution GraphQL.

The emergence of this type of technology will make data consumption much easier, faster, and more efficient than in the past. This is a real simplification that will give data access to many more users while remaining complementary to other integration technologies.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
data virtualization ,big data ,data analytics ,data access ,data integration

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}