Keys To Success: Big Data Strategy
Keys To Success: Big Data Strategy
Knowing the problem you are trying to solve is crucial to your success with Big Data.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
To gather insights on the state of big data today, we spoke with 22 executives from 20 companies who are working in big data themselves or providing big data solutions to clients. Here’s who we talked to:
- Nitin Tyagi, Vice President Enterprise Solutions, Cambridge Technology Enterprises
- Ryan Lippert, Senior Marketing Manager and Sean Anderson, Senior Product Marketing Manager, Cloudera
- Sanjay Jagad, Senior Manager, Product Marketing, Coho Data
- Amy Williams, COO, Data Conversion Laboratory (DCL)
- Andrew Brust, Senior Director Market Strategy and Intelligence, Datameer
- Eric Haller, Executive Vice President, Experian DataLabs
- Julie Lockner, Global Product Marketing, Data Platforms, Intersystems
- Jim Frey, V.P. Strategic Alliances, Kentik
- Eric Mizell, Vice President Global Engineering, Kinetica
- Rob Consoli, Chief Revenue Officer, Liaison
- Dale Kim, Senior Director of Industrial Solutions, MapR
- Chris Cheney, CTO, MPP Global
- Amit Satoor, Senior Director, Product and Solution Marketing, SAP
- Guy Levy-Yurista, Head of Product, Sisense
- Jon Bock, Vice President of Product and Marketing, Snowflake Computing
- Bob Brodie, CTO, SUMOHeavy
- Kim Hanmark, Director of Professional Services EMEA, TARGIT
- Dennis Duckworth, Director of Product Marketing, VoltDB
- Alex Gorelik, Founder and CEO and Todd Goldman, CMO, Waterline Data
- Oliver Robinson, Director and Co-Founder, World Programming
We asked, "What are the keys to a successful big data strategy?" Here's what they told us:
- Keep your data in one place. Ensure the architecture and platform decisions are correct up front. Do not introduce excessive overhead. Have a system that can recover from failure quickly. This is difficult with more disparate systems.
- Clear understanding of what big data problem you are trying to address. Have clear metrics and ROI expectations. For example, incorporating Twitter data into stats and respond in a timely manner. Don’t use “big data,” use a precise description of the problem you are trying to solve.
- You're going to have lots of data sources. You need to 1) Need to integrate the different sources. 2) Cleanse the data and transform it into a consistent and usable format. 3) Drive business insight. Take disparate data sources and look for correlations in ways that haven’t been done before.
- Work on the front-end preparing content for analysis by enriching and normalizing the data.
- 1) First, it is essential to pick the right architecture and engine for the design objectives at hand. Most choices can handle the volume challenges, but not all deal equally well with velocity and variability. In our case, we had additional design constraints that guided our strategy. We had to support highly performant streaming ingest while also delivering extremely fast query response. We also needed data partitioning and system fairness as essential aspects of a multi-tenancy. 2) Next, build fluency in the chosen technology, understand its limits, and figure out where you are going to have to fill gaps early in the process. We changed data storage layers more than once, and we completely re-evaluated our approach to the indexing layer because our initial assumptions about data structure proved untrue. 3) Then, take control of the operating environment if you need to ensure high performance. AWS is great for dev, but if you want deterministic responsiveness, you’ll probably have to run it yourself. 4) Finally, keep your eyes open and never stop looking at the evolving big data technology landscape. There’s great new stuff coming out all the time. You can’t afford to change the whole architecture every month, but you also don’t want to get locked in so deep that you can’t take advantage when something better comes along.
- The data value chain is where you pour in a lot of data and pull a few selective insights. For example, you can get IoT data from windmills and solar cells and recognize unexpected correlations.
- We’re both a user and provider. Make sure to make data and insights accessible to as many people as possible. Data used to be siloed. Make access to data insights available across the company. Deploy technical solutions to support business goals that can be extended to support the rest of the company.
- Wide adaption for data-driven decision making. Get the right data to the right people. We’ve been binge drinking on big data, adding data to data lakes and turning them in to data swamps. Automatic cataloging and assigning semantic meaning to crawl the data lake and the enterprise of organize the data. Improve finger printing with machine learning. Get people to find and use the data they need.
- Start with a data lake like Hadoop or a network SAN. Build out the meta data, security, and access. Emerging on top is the fast data layer enabling users to bring data together and to act on it quickly.
- Large amounts of data to collect, analyze, and act on in real time. Putting data to good use is a strategic advantage. Derive value from data in a tangible way. You can have data sprawl. Governance and tools offer many ways to execute big data projects. Companies have success when they think beyond big data as a finance project. Standardizing projects is another challenge – keeping up to date with the latest release. Able to build an enterprise practice they are used to. Infrastructure evolves to meet the needs of the business development workload.
- Understand the problem you are trying to solve. Bought a Hadoop cluster without knowing what they’re trying to accomplish. Unable to do joins and queries in a way they are accustomed to. Customers are successful using Hadoop to know what problem they are solving.
- Structured data and what big data might mean. Be clear about what the business need is. Don’t be driven by technology and tools. Big data is a long-term, expensive play. Justify the requirements and the return along the way.
- Understand what’s available to you and frame the problem you are trying to solve. Identify the value between the interaction of the data sets. Mitigate risk, upsell and cross sell using inventory data.
- Have a strategy. Put data in a data lake. Identify specific use cases and execute them so you can see the time to value. Show benefits deeper down, or to other parts of, the organization.
- Malaise. Where to go next. We were excited about the technology versus the traditional database warehouses and business intelligence tools. Everything needs to be cost justified and useful. This comes down to use cases. Data is an asset. Where can it be leveraged? What’s required to get the desired results? Identify and map before the work begins in earnest. Involve business units, IT, and business analysts and reach consensus on the efficiency of use cases to be pursued. We facilitate use case discovery workshops – collaborative white boarding sessions in which business teams present use cases and the expected results. Create an instinct that as one use case concludes you roll into the next one. A back-to-basics approach.
- Real data science problems. Stealth engagements. Start by helping with a technical problem. Start small and scale fast then go back and request funding. Don’t validate the entire data science initiative at one time. Pick a business unit and solve their well-defined data science problem.
- Everyone’s collecting a bunch of data without giving thought to what they want to do with it.
- Innovation without disruption. Many organizations need to leverage existing systems to maximize their current and future investments. Technology and solutions that can enrich their business environments need to conform to industry standards and enable innovation without disruption. However, from a business and technical perspective, disruption can manifest itself in various forms. This means customers need to focus on the right areas and ask the right questions when evaluating the resources, you may need to support new technologies and solutions. For example: 1) Data acquisition – Can existing tools be used to acquire the different types of data needed to support a broader range of analytics? 2) Integration – Existing systems and tools need to be leveraged and reused; any new system has to integrate seamlessly and have a clear migration path. 3) Data storage – Various data assets across the enterprise span different time periods, with some data artifacts offering businesses more value than others, the degree of which will vary depending on the criticality of query speed. 4) Data duplication and proliferation – Organizations have long struggled with this issue. Each new business imperative results in the creation of yet another data mart and duplicated data. Any new solution has to collapse this mountain of data and provide data consolidation to simplify the IT landscape
- The key to a successful big data strategy is having a well thought out upfront understanding of what you want to get out of it from a business perspective. Being able to collect certain big data sources does not necessarily build a good business case. Business Mode Canvas is a great way of understanding what impacts your business and therefore makes sense to gather data about.
- It is so important to clearly understand and define what you are trying to achieve and what your business goals are. Simply processing data is a meaningless and costly exercise without a defined end objective. It sounds obvious, but companies often start at the engineering end of the problem without clearly knowing what they are trying to achieve. Executives can get wrapped up in the buzz of the technology: “machine learning,” “Hadoop,” and “Artificial intelligence.” However, while you do need to understand the technology concepts, you must first focus on the business objectives. Once you’ve done this, the next step is solving the engineering and development side of the project. The end goal might result in providing aggregated views of the data, providing a dataset of Exploratory Data Analysis (EDA) or providing data to feed a machine learning or artificial intelligence based objective. Either way, 80 percent – 90 percent of the resource cost will likely be focused on the data processing, data preparation, and engineering effort.
What are the keys to a successful big data strategy from your perspective?
Opinions expressed by DZone contributors are their own.