What’s Next for Big Data?
What’s Next for Big Data?
In just a few short years, Big Data has already transformed the way companies do business, and we’ve only just begun to scratch the surface.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
In just a few short years, Big Data has already transformed the way companies do business, and we’ve only just begun to scratch the surface. As companies have learned to gather all sorts of data, they’ve begun to see the potential in what lies ahead for putting that data to good use.
Some transformative companies are finding that their data could actually be their biggest asset. Not only are these data-savvy companies able to learn about and better serve their customers through insights gained from data, but they are also finding ways to monetize their data by selling it to partners and downstream vendors. For example, services like Uber and Lyft are gathering tremendously insightful data about customers’ travel habits, as are sites like Airbnb, VRBO and others. Meanwhile, Fitbit and other companies that offer fitness trackers have discovered tremendous value in the health and activity data their users monitor and upload. Even Apple, which certainly isn’t in the business of health care, now has unprecedented insight with its native Health app.
In theory, this massive treasure trove of data opens a whole new world of opportunities for both B2B and B2C companies to gather and act on insights in ways they never imagined. But, because of some significant technical and financial obstacles, not every company has figured out what’s next. They’ve dipped their toes into the data mining waters, but haven’t yet devised a solid strategy for how to move forward.
Why the Challenge?
One of the biggest obstacles to realizing the promise of Big Data is the massive financial investment required. So far, most successes have come through multimillion-dollar projects like @WalmartLabs, Walmart's dedicated data innovation lab. But, this is the world’s largest company, with very deep pockets and virtually endless resources. Of course, this sets a standard that very few companies can hope to achieve.
What makes actually leveraging Big Data so resource intensive? There are three primary reasons:
Data is coming in faster, and from a rapidly increasing number of sources: mobile, cloud applications, the Internet of Things—from RF tags that track inventory and equipment to household appliances, everything, it seems, is now “online”—and, of course, there is real-time data from social media.
Almost all of these new sources deliver data in unstructured or semi-structured formats, which renders conventional relational database management—the basis for SQL, and nearly all modern database systems—virtually useless. In addition to collecting and storing challenges, privacy and regulatory compliance requirements create a significant new layer of complexity, with constantly-evolving standards that need an entire team, along with advanced technology, to manage and maintain.
As Big Data has gotten more complex, the technologies for managing data have also grown increasingly complex. Open source tools like Hadoop, Kafka, Hive, Drill, Storm, MongoDB, Cassandra and more, plus a litany of proprietary spin-off and competing solutions, all require deep technical expertise to operate and apply in a business setting. And these resources are scarce and both difficult and costly for most non-Fortune 500 companies to acquire.
It’s easy to see why the vast majority of companies are struggling to merely manage and mine their data stores, let alone actually use that data to their advantage. There is a tremendous void in practical, useful and realistic tools that enable the average business to effectively capitalize on their data. To be clear, there’s hardly a shortage of Big Data tools—but efficient, effective solutions that don’t create data silos and giant inter-dependent loops that are extremely difficult to maintain are sorely lacking.
Why? So far, the focus has been on integrating applications or building connections between various independent tools and platforms to make them work together—linking CRM and help desk ticketing systems, for example, or CRM to ERP, or sales tools with marketing automation.
The problem with this app-to-app approach is that it completely ignores the data, which may still very likely remain splintered, siloed or fragmented. Even though the applications may connect, if each application has its own data storage, the data may not. This results in incomplete or duplicated records and generally “dirty” data. Any analysis that takes place is therefore patently unreliable, because the data itself is unreliable.
What’s it Going to Take?
In order to truly get a handle on Big Data—and start using it for insight and business growth, rather than just collecting it—a new approach is needed that focuses on the data itself, not the applications. Addressing integration at the data level, rather than the application level, is essential for any Big Data initiative to succeed.
By marrying integration and data management into a single unified platform that helps to build a comprehensive, clean, source-agnostic data lake, businesses could create a foundational single source of truth that’s easily accessible to write or read by any source or analytics application. Not only would this open the door to connecting virtually any application to the right data in the right way for virtually any purpose, but it would also dramatically improve the efficiency, accuracy and trustworthiness of analysis.
iPaaS Is the Answer? Not so Fast…
While some have touted iPaaS (Integration Platform as a Service) as the solution, this self-service approach still puts the burden of complex integration work on the internal team, assuming that the company has the resources and wants its IT and business staff to manage integration “plumbing.” As the need for new integrations grows at an exponential rate, there’s no roadmap for smooth scale out in an iPaaS approach, not to mention that compliance and data governance can also easily become compromised. Giving business users the ability to configure integrations independent of IT may lead to gaping holes in security and compliance posture, inadvertently exposing the organization to a breach or penalty, while also potentially creating data silos and unsupportable one-off implementations that IT’s integration strategy was designed to prevent.
Eventually, what was promised to be simple, less expensive and expandable becomes another dead end. With iPaaS, there is limited future readiness; in essence, it’s just a temporary fix that must be repeated over and over as needs grow and change.
The Ideal Solution: dPaaS Makes Big Data Success a Reality
Thankfully, there’s an entirely new approach to Big Data management and integration that is finally giving businesses of all sizes an effective, manageable, scalable and future-ready way to leverage Big Data.
Data Platform as a Service, or dPaaS, is a unified multi-tenant, cloud-based platform that provides integration and data management as fully managed services for a more flexible and data-centric, application-agnostic way to meet nearly any Big Data need. Rather than focusing on integrating applications, dPaaS integrates the data, ensuring cleanliness, quality, accessibility and compliance across every application that reads or writes to the data lake.
With dPaaS, companies can say “goodbye” to data silos and complex, costly integration projects, and instead enjoy the ability to add new applications at any time, draw from a consolidated data repository and retain complete visibility of the full data lifecycle, all with built-in compliance and governance.
Here are a few key features.
Unified Data Management
With dPaaS, an organization’s entire data repository is managed in a single, comprehensive store. Whereas iPaaS and app-to-app integrations can leave data silos, mismatched fields, missing values, duplications and other “dirty” data issues, dPaaS maintains the data independent of applications. It creates and persists a schema-less, central repository complete with the requisite metadata relationships to mesh with virtually any data source, enabling businesses to easily add new applications any time with the confidence that the data will be clean, comprehensive and accurate.
Keeping up with constantly evolving compliance requirements is becoming increasingly difficult and expensive, with time-consuming and resource-intensive audits and continuous re-certifications. But, with dPaaS, compliance is assured at the data level on continuously-certified infrastructure maintained by the platform provider, ensuring a holistic approach to compliance, rather than a piecemeal, fragmented application focus. Not to mention, dPaaS shifts the bulk of the compliance burden to the provider, with data compliance in all states—both at rest and in motion.
Center of Excellence
dPaaS builds an integration center of excellence (COE) that allows even SMBs to leverage the resources, knowledge, processes, tools and talent of the vendor to achieve greater efficiency and tackle more complex business processes and challenges. Building a COE internally would be patently impossible with even a decent-sized team, but with dPaaS, the COE comes standard. The platform vendor provides the experts, resources and tools to deliver a comprehensive integration COE, allowing virtually any size business to leverage cutting-edge expertise and services.
Unlike do-it-yourself iPaaS solutions, dPaaS shifts the burden of integration complexity onto the platform provider, who takes responsibility for ETL and other “plumbing” processes that form the basis of the integration. This is not only exceptionally more cost effective for the business, but also enables continuous access to the latest technologies from a provider that has a competitive incentive to stay on the cutting-edge. That means the internal staff and budget can be applied to more strategic projects that drive revenue and serve the organization’s core mission.
The Future Is Bright With dPaaS
With its comprehensive, unified approach to data integration and management, dPaaS is showing tremendous promise to move us past the data mining stage to the data leveraging stage of Big Data. By providing all of the tools and expertise—as well as a future roadmap—dPaaS enables businesses to launch and operate a Big Data initiative far more efficiently, effectively and affordably than a DIY approach.
Instead of wasting time and expertise on janitorial work to clean, harmonize and synchronize data and reinventing the wheel with each new application integration, dPaaS allows each business to focus on extracting actionable intelligence quickly and accurately to gain—and maintain—a competitive edge.
Opinions expressed by DZone contributors are their own.