Do You Have the Data Agility Your Business Needs?
Do You Have the Data Agility Your Business Needs?
If companies will thrive in a data-driven economy, they can’t be handcuffed to old technologies. Data infrastructures today must keep pace with data requirements.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Data is the new battleground. For companies, the situation is clear: their future depends on how quickly and efficiently they can turn data into accurate insights. This challenge has put immense pressure on CIOs to not only manage ever-growing data volumes, sources, and types but to also support more and more data users as well as new and increasingly complex use cases.
Fortunately, CIOs can look for support in their plight from unprecedented levels of technological innovation. New cloud platforms, new databases like Apache Hadoop, and real-time data processing are just some of the modern data capabilities at their disposal. However, innovation is occurring so quickly and changes are so profound that it is impossible for most companies to keep pace, let alone leverage those factors for a competitive advantage.
It's clear that data infrastructures today can’t be static if they are to keep pace with the data requirements of the business. Today’s competitive environment requires adaptive and scalable infrastructures able to solve today’s challenges and address tomorrow’s needs; after all, the speed with which you process and analyze data may be the difference between winning and losing the next customer. This is significantly more important today than 10 or 15 years ago since companies used to make a strategic database choice once and keep running it for a decade or two. Now we see companies updating their data platform choices far more frequently to keep up.
If companies are to thrive in a data-driven economy, they can’t afford to be handcuffed to ‘old’ technologies; they need the flexibility and agility to move at a moment’s notice to the latest market innovations. However, it’s not enough for companies to simply be technology agnostic; they also need to be in a position to re-use data projects, transformations, and routines as they move between platforms and technologies.
How can your company meet the agility imperative? To start, let’s consider the cloud question.
Many Clouds and Constituencies
In a data-driven enterprise, the needs of everyone — from developers and data analysts to non-technical business users — must be considered when selecting IaaS solutions. For example, application developers who use tools such as Microsoft Visual Studio and .NET will likely have a preference for the integration efficiencies of Microsoft Azure.
Data scientists may want to leverage the Google Cloud Platform for the advanced machine learning capability it supports, while other team members may have a preference for the breadth of the AWS offering. In a decentralized world where it’s easy to spin up solutions in the cloud, different groups will often make independent decisions that make sense for them. The IT team is then saddled with the task of managing problems in the multi-cloud world they inherited — problems that often grow larger than the initial teams expected.
One way to meet a variety of stakeholders’ needs and embrace the latest technology is to plan a multi-cloud environment by design, creating a modern data architecture that is capable of serving the broadest possible range of users. This approach can safeguard you from vendor lock-in, and far more importantly, ensure you won’t get locked out of leveraging the unique strengths and future innovations of each cloud provider as they continue to evolve at a breakneck pace in the years to come.
Integration Approaches for Data Agility
Once perhaps considered a tactical tool, today the right integration solution is an essential and strategic component of a modern data architecture, helping to streamline and maximize data use throughout the business. Your data integration software choice should not only support data processing “anywhere” (on multi-cloud, on-premise, and hybrid deployments) but also enable you to embrace the latest technology innovations, and the growing range of data use cases and users you need to serve.
I said “data integration software” as I simply don’t believe that a modern data architecture can be supported by hand-coded integration alone. While custom code may make sense for targeted, simple projects that don’t require a lot of maintenance, it’s not sustainable for an entire modern data architecture strategy.
Hand coding is simply too time-consuming and expensive, requiring high-paid specialists and high ongoing maintenance costs. Moreover, hand-coded projects are tied to the specific platform they were coded to, and often even a particular version of that platform, which then locks the solution to that vendor and technology snapshot. In a continually accelerating technology environment, that’s a disastrous strategic choice. Also, hand coding requires developers to make every change, which limits the organization’s ability to solve the varied and evolving needs of a widely distributed group of data consumers. And finally, it can’t leverage metadata to address security, compliance, and re-use.
Traditional ETL Tools
Traditional ETL tools are an improvement over hand-coding, giving you the ability to be platform agnostic, use lower skilled resources and reduce maintenance costs. However, the major drawback with traditional ETL tools is that they require proprietary runtime engines that limit users to the performance, scale, and feature set the engines were initially designed to address.
Almost invariably they can’t process real-time streaming data, and they can’t leverage the full native processing power and scale of next-generation data platforms, which have enormous amounts of industry-wide investment continually improving their capabilities. After all, it’s not simply about having the flexibility to connect to a range of platforms and technologies — the key is to leverage the best each has to offer. Moreover, proprietary run-time technologies typically require software to be deployed on every node, which dramatically increases deployment and ongoing management complexity.
Importantly, this proprietary software requirement also makes it impossible to take advantage of the spin up and spin down abilities of the cloud, which is critical to realizing the cloud’s potential elasticity, agility and cost savings benefits. Traditional ETL tools simply can’t keep up with the pace of business or market innovation and therefore prevent, rather than enable digital business success.
Agile Data Fabric
What’s required for the digital era is scalable integration software built for modern data environments, users, styles, and workflow — from batch and bulk to IoT data streams and real-time capabilities — in other words, an agile Data Fabric.
The software should be able to integrate data from the cloud and execute both in the cloud and on-premises. To serve the increasing business need for greater data agility and adaptability, integration software should be optimized to work natively on all platforms and offer a unified and cohesive set of integration capabilities (i.e. data and application integration, metadata management, governance and data quality). This will allow organizations to remain platform agnostic, yet be in a position to take full advantage of each platforms’ native capabilities (cloud or otherwise) and data technology. All the work executed for one technology should be easily transferable to the next, providing the organization with economies of skills and scale.
The other critical capability you should look for in an Agile Data Fabric is self-service data management. Moving from a top-down, centrally controlled data management model to one that is fully distributed is the only way to accelerate and scale organization-wide trustworthy insight. If data is to inform decisions for your entire organization, then IT, data analysts and line of business users all have to be active, tightly coordinated participants in data integration, preparation, analytics, and stewardship. Of course, the move to self-service can result in chaos if not accompanied by appropriate controls, so these capabilities need to be tightly coupled with data governance functions that provide controls for empowering decision makers without putting data at risk and undermining compliance.
The challenge CIOs face today is acute — with rapidly advancing platforms and technology, and more sources to connect and users to support than ever before. Meeting these new and ever-evolving data demands requires that companies create a data infrastructure that is agile enough to keep pace with the market and the needs of the organization.
Published at DZone with permission of Mike Tuchen , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.