Over a million developers have joined DZone.

vFabric Data Director Accelerates Database Virtualization and Big Data Adoption

DZone's Guide to

vFabric Data Director Accelerates Database Virtualization and Big Data Adoption

· Big Data Zone ·
Free Resource

Learn how to operationalize machine learning and data science projects to monetize your AI initiatives. Download the Gartner report now.

Virtualization continues to be one of the top priorities for CIOs. As the share of virtualized workloads approaches 60%, the enterprise is looking at database and big data workloads as the next target. Their goal is to realize the virtualization benefits with the plethora of relational database sprawling in their data centers. With the increasing popularity of analytic workloads on Hadoop, virtualization presents a fast and efficient way to get started with existing infrastructure, and scale the data dynamically as needed.

VMware’s vFabric Data Director 2.5 now extends the benefits of virtualization to both traditional relational databases like Oracle, SQL Server and Postgres as well as Big Data, multi-node data solutions like Hadoop. SQL Server and Oracle represent the majority of databases in enterprises, and, Hadoop is the one of the fastest growing data technologies in the enterprise.

vFabric Data Director enables the most common databases found in the enterprise to be delivered as a service with the agility of public cloud and enterprise-grade security and control.

The key new features in vFabric Data Director 2.5 are:

  • Support for SQL Server – Currently supported versions of SQL Server are 2008 R2 and 2012.
  • Support for Apache Hadoop 1.0-based distributions: Apache Hadoop 1.0, Cloudera CDH3, Greenplum HD 1.1, 1.2 and Hortonworks HDP-1. Data Director leverages VMware’s open source Project Serengeti to deliver this capability.
  • Streamlined Data Director Setup – Complete setup in in less than an hour
  • One-click template creation for Oracle and SQL Server through ISO based database and OS installation
  • Oracle database ingestion enhancements – Now includes Point In Time Refresh (PITR)

Data Director’s self-provisioning enables a whole new level of operational efficiencies that greatly accelerates application development. With this new release, Data Director now delivers these efficiencies in a heterogeneous database environment.

Current Ways of Database Management

Historically, there were only two primary ways of managing databases.

  • Wild West: where everyone has the rights and tools to create and manage their own databases.
  • Highly Controlled:  where only a few DBAs could create and manage databases and developers had to submit all requests in the form of a ticket.

While there are a variety of approaches in-between these two extremes, as database infrastructure grows, each option represents more labor-intensive management and increasingly higher levels of inefficiencies.

In the case of the Wild West, this leads to database sprawl, with some companies managing hundreds or even thousands of heterogeneous databases with a variety of database versions and operating systems. Many of these databases reside on under-managed and unsecured physical machines. Others, due to their heavy usage, grow beyond their physical host capacity. Add to this that many of these are hosting duplicate data – where the simplest approach is to throw additional hardware at the problem – and it easily becomes cost-prohibitive. That’s even before you consider that these fragmented database environments are hard to secure, monitor or enforce corporate policies.

While the Highly Controlled approach initially sound better, it quickly impacts efficiency as overwhelmed DBAs manage hundreds of databases. This leads to increased lead-time for database provisioning to developers, which results in longer application development times. We have seen provisioning times of days to months, depending on the DBA capacity, and need for hardware procurement.

This approach reflects the challenges of dealing with traditional databases; multi-node data architectures, like Hadoop, make these issues exponentially more complex. When combined with the prevalent lack of skill-set in the marketplace today to effectively manage Hadoop implementations, organizations are often forced to either balkanize their data or outsource these analytical workloads.

Data Director Database Virtualized Platform

Data Director represents a much simpler and more efficient alternative to these kinds of large-scale database environment challenges. It enables organizations to realize operational efficiencies, while at the same time streamline database management through policy enforcement and unified security models. Data Director allows organizations to think of database provisioning and management as a true heterogeneous service (DBaaS).

By allowing developers to provision databases on-demand, based on predefined templates through a self-service interface under a common security model, Data Director delivers DBaaS with an elastic, multi-tenant platform under single management architecture, while maximizing existing storage.

What’s new in Data Director 2.5

With the release of Data Director 2.5, VMware brings to the marketplace new database support, as well as exciting feature enhancements.

In addition to Oracle 10gR2, 11gR2 and vFabric Postgres 9.0, 9.1, Data Director now supports Microsoft SQL Server 2008 R2 and 2012 databases, as well as Hadoop.

Data Director Database Creation Wizard

The newly introduced SQL Server support leverages existent tools and comes pre-configured with a number of built-in configurations. To leverage the Active Directory security model of SQL Server, Data Director also supports joining newly created VMs to domain as part of the provisioning process. Whether SQL Server, Oracle or vFabric Postgres, this provisioning and management process can be now managed through a single pane of glass.

Data Director for Hadoop Dashboard

The elasticity of the underlying vSphere virtualization platform helps Hadoop achieve new levels of efficiency. This architecture enables organizations to share the existing infrastructure with Big Data analytical workloads to deliver optimal storage capacity and performance.

While the idea of a dynamically scalable Hadoop cluster, capable of using spare data center capacity, was part of Project Serengeti from the beginning, the recent enhancement of its on-demand compute capacity makes this notion much easier to implement.

Using the new Hadoop Virtualization Extensions (HVE), which resulted from VMware’s work with the Apache Hadoop community, Serengeti is now able to on-demand scale up and shut down compute nodes based on resource availability while still fully leveraging HDFS data locality.

Greenplum HD 1.2 is the first distribution to include HVE. It helps Hadoop to truly be aware of the underlying virtualization, which in turn allows Hadoop to deliver the same level of performance already experienced by other vSphere workloads.

Serengeti Shared Workload Architecture

Using the deployed combination of Data Director and Serengeti, Hadoop users can quickly and efficiently analyze data already existent in HDFS within minutes, not hours. And, when another, perhaps more important workload demands these previously unused compute cycles, Serengeti releases them back to the pool.

Build your DBaaS using Data Director

Database-as-a-service (DBaaS) is designed to overcome the above described application development challenges by delivering a platform that helps provision and administer databases in a faster, more secure and cost-effective manner.


One of the key objectives of Data Director is to bring order into the database management space. By streamlining many of the repetitive tasks with easy to create templates, Data Director is able to enforce predefined policies across the enterprise. This in turn allows IT to leverage underlining High Availability of the underlining virtualized infrastructure to dynamically scale and deliver higher SLAs.

By supporting common security and administration policies across databases, your new DBaaS can leverage common policies across a farm of databases running a variety of operating systems. These policies will enforce uniformity of frequent administration tasks across databases, ensuring compliance, consistency, and security; all of this at a lower total cost.


One of the ways Data Director is able to increase efficiency is by simplifying routine database life-cycle management tasks, and when possible, automating them to limit human interaction purely to monitoring. Data Director’s support for point-in-time restore of ingested database, automatically combined with the seamless patch management and upgrade process, is a perfect example of listening to developers, DBAs and IT personnel and delivering to them the very features they need.

Another key feature of any DBaaS solution is the ability to automate the administration and monitoring tasks like creation, backup, recovery, tuning, optimization, patching, and upgrading. Based on custom policies, Data Director enables DBA’s to greatly automate mundane database management tasks and focus on proactive maintenance and dealing with abnormalities.


The basic requirement of anything as a service, is the capability for users to self-provision the necessary resources. Data Director allows database consumers such as application developers, testers, and architects to provision databases easily based on built-in or custom templates and under finely grained security constrains.

By reducing the time it takes to perform these tasks, Data Director greatly reduces the time it takes to deliver a new application to the market. In many cases, even the less technical users can securely leverage Data Director services while being compliant with all corporate IT policies.

Extending Data Director

With the recent additions, VMware’s vFabric Data Director finally makes the idea of heterogeneous database management a reality. This unified DBaaS platform can leverage underlining virtualization to standardize management of the entire life-cycle of databases, regardless of the provider.

In cases where additional third-party tool-set integration is necessary, Data Director’s capabilities can easily be extended using integrated REST API. This interface extends many of the same capabilities found in the Data Director GUI.

Additional Info

This entry was posted in  Data DirectorPostgresSerengetivFabric and tagged  apachebigCIOClouderadatadatabase, DBAdirectorGreenplumhadoophortonworksMicrosofton demandOraclePostgressecurityserengetiServersql, virtualization by  Mark Chmarny. Bookmark the  permalink. Edit
Mark Chmarny

About Mark Chmarny

During his 15+ year career, Mark Chmarny has worked across various industries. Most recently, as a Cloud Architect at EMC, Mark developed numerous Cloud Computing solutions for both Service Provider and Enterprise customers. As a Data Solution Evangelist at VMware, Mark works in the Cloud Application Platform group where he is actively engaged in defining new approaches to distributed data management for Cloud-scale applications. Mark received a Mechanical Engineering degree from Technical University in Vienna, Austria and a BA in Communication Arts from Multnomah University in Portland, OR.

Bias comes in a variety of forms, all of them potentially damaging to the efficacy of your ML algorithm. Our Chief Data Scientist discusses the source of most headlines about AI failures here.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}