What Are the Keys to Big Data?

DZone 's Guide to

What Are the Keys to Big Data?

4 keys to having a successful big data strategy are: know the business problem you're trying to solve; governance and operations;strategy and structure; and speed of delivery.

· Big Data Zone ·
Free Resource

To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "What are the keys to a successful big data strategy?" Here's what they told us:

ID the Business Problem

  • Understand the problem you’re trying to solve. Frequently, that’s the complexity and scale of the datasets. We use OLAP to garner insights. Rapid aggregation of large datasets or streams of data. Help take large streams of data and ingest so you can slice and dice to gather insights.
  • Blurring of products and toolsets around databases and big data. Take advantage of innovation in the stack. Be case-driven and focus on outcomes and results. Help customers make practical decisions.
  • Look at the requirements and the problems you are looking to solve — performance or data access so you can make the right choice of technology. 
  • A big data strategy with a sole purpose of exploring possibilities is likely to end up in misunderstanding. An efficient strategy must be driven by a pragmatic approach, first by identifying business problems to solve and validating assumptions through experiments involving internal users and customers. Optimizing a legacy data warehouse by moving data to modern infrastructure and technologies more suitable for large data volume handling, processing, and facilitating the run of complex algorithms and analysis. A cloud architecture can often be considered at this stage. Build common practices and architecture framework around the concept of the data lake to quickly unleash the value of data. By doing so, companies can benefit from a robust and scalable architecture built on multiple layers to properly collect and ingest data at different paces, store large amount of raw data in any formats, protect sensitive information, manage data quality at expected level, refine data if necessary and derive data to quickly allow analysis and accessibility.
    • Lead change for information being the company fuel by applying the right user experience (UX) on how data are delivered to employees or the company’s digital ecosystem. New customer-oriented marketing based on big data should be used to improve and even alter the current marketing practices. Care should be taken not to fall into excesses statistics and/or spurious correlation that could be beyond needed actionable steps for customers. The data, analytics, and insights generated by the analysts must be communicated precisely and openly to internal users. The final information should be represented in a way that its value is connected to actions by the implementation team.
  • Having a clear and worthy end goal; practical implementation strategies that limit the risk in these likely huge undertakings; and choosing the right technologies and partners for the right job.
  • It’s not actually about the big data itself. Firstly, it’s about identifying the current processes or actions that are present in your business (or project). By identifying, I explicitly mean document — using BPMN (business process model and notation), UML, word processor, whiteboards, chalkboards, cameras on phones, napkins, etc. This crucial step is required to enable, if not force, all technology decisions (selection and implementation) to benefit and be driven by the needs of the business. When approached in this manner several benefits emerge that not only help identify where to employ a big data analytic approach but how to measure improvements:
    1. The individual steps needed to achieve the business goals via one or more processes are revealed to all stakeholders.
    2. The constraints (inputs, outputs, allotted time) required when executing each step are also identified. For instance, Step 1 may need to complete in 3 seconds.
    3. The data required for each step, process, and business goals are identified. Importantly, the data is identified within the scope of the business needs. (We want to avoid selecting technology [acquiring or building] for any other reason. The battlefield of business history is littered with efforts that selected technology because it was popular, cool, familiar, low or no cost or a resume builder.)

Governance and Operations

  • Information governance and management. Customers don’t know where all of their data is. We identify all of the data and where it resides. We take aged data and determine how recently it has been accessed. While data is growing rapidly, there’s always a lot more data that’s 30+ days old and active data less than 30 days old.
  • Organization is one of the largest keys to a successful big data strategy, whether that’s internal organization with clear definitions of roles for collecting, analyzing, and acting upon this data or utilizing the right platforms and tools for strategic goals. Making sure there are processes that organize data streams into usable information, and then translating that into understandable action points are challenges a lot of enterprises face.
  • There are a number of key components for a successful big data strategy, but possibly the most practical advice is to build an initial data lake (or “dataland,” as we like to call it) for data scientists to use as a playground to ideate new applications for that data. This can turn into a number of quick wins even before a more disciplined approach to big data is realized. However, at some point, data governance becomes extremely important, since, most likely, some of the data assets that will end up in the data lake will have sensitive information that must be guarded appropriately
    • There is also the talent factor, which must be considered initially in order to make any initiative successful. Unfortunately, you will be needing to identify in the organization (or hire, otherwise) individuals who could be considered data scientists, a combination of a statistical/math buff, with a soft spot for data, who can do a reasonably good work at programming, for which candidates are not exactly growing on trees. When given the option, it’s probably more effective to grow them from existing people with domain expertise in your organization and existing data skill, building on top of them the statistical acumen and possibly relying on existing toolsets (both open-source and commercially available) to supplement their limited programming ability. 

Strategy and Structure

  • We expected specialized technologies, new opportunities to do distributed computing, now we’re seeing organizations with flat IT budgets looking to take costs out of IT so they can focus on innovation. Data volumes are growing. Need to focus on the process versus the data. Understand that data is the enabling layer but also the obstacle to pursuing microservices, AI, and IoT. What is the data fabric strategy you are going to use to pursue new technologies — multi-cloud, hybrid processes, and microservices.
  • From my experience, it is imperative to develop a strategy incrementally, starting with key use cases that can benefit from big data technology before moving to broad, organization-wide projects. Beyond that, a successful strategy involves getting the right people in place that understand both the technology spectrum and the business goals and picking the right architecture. Data streaming technology can be a huge help in creating an architecture that connects different data silos, data lakes, etc. Also important is putting the right governance policies in place for making data accessible across an organization.
  • Pay attention to the structure and the quality of the data as you are moving it or ingesting. Have a process for maintaining the structure and the quality.
  • One of the keys to a successful big data strategy is the ability to use necessary data efficiently in the enterprise for analytics/machine learning. This requires real-time and easy access to lineage preserved to the data wherever it may be stored without the traditional ETL process that results in multiple copies and delays in accessing the data.
  • The most successful organizations who have adopted a big data strategy reported these observations: The use of a unified, comprehensive and flexible data management platform enables speed, reusability, and trust with far less manpower than traditionally manual and complex approaches. The use of a unified, comprehensive and flexible data management platform also enables organizations to focus manpower on business logic and business context, and not be delayed by the complexities of an ever-changing infrastructure ecosystem. The ability to leverage AI-driven technology to guide user behavior and automate processes.

Speed of Delivery

  • Real-time analytics and fast data processing. Don’t fall into cluster sprawl. Complexity can be high. Leverage Spark to innovate on big data in a more concise way. Simplify architecture and performance. Focus on high performance so you can get answers quickly.
  • A work in progress that’s not easily solved. Connect quickly with a self-service model to empower business analysts and data scientists to be self-sufficient and independent.
  • Pull together data sources to answer questions. Don’t get stuck using a single system. Leave the data where it lives naturally and do the analytics there. If you bring the data to a data warehouse, it’s out of date. Answer questions quickly. Run different “what if” scenarios and respond to questions in real0time.


  • Back-up, recovery, and protection especially with the growth of ransomware. Data is business critical. Treat it as such.
  • The more metadata you have the more it can work for you.
  • Data operations ensuring data is moving across the enterprise while you are able to keep your finger on where it is and what it’s being used for. Operational oversight, quality, and SLAs. Big data can be difficult for companies to use. Kafka is complex and can be difficult to get started. We provide a UI that removes the requirement for upfront programming.
  • Municipalities have been collecting data for a long time. That data has been collecting dust. We provide the resources for non-technical people to clean and work with the data.

Here’s who we spoke to:

  • Emma McGrattan, S.V.P. of Engineering, Actian
  • Neena Pemmaraju, VP, Products, Alluxio, Inc.
  • Tibi Popp, Co-founder and CTO, Archive360
  • Laura Pressman, Marketing Manager, Automated Insights
  • Sébastien Vugier, SVP, Ecosystem Engagement and Vertical Solutions, Axway
  • Kostas Tzoumas, Co-founder and CEO, Data Artisans
  • Shehan Akmeemana, CTO, Data Dynamics
  • Peter Smails, V.P. of Marketing and Business Development, Datos IO
  • Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
  • Ali Hodroj, Vice President Products and Strategy, GigaSpaces
  • Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
  • Fangjin Yang, Co-founder and CEO, Imply
  • Murthy Mathiprakasam, Director of Product Marketing, Informatica
  • Iran Hutchinson, Product Manager and Big Data Analytics Software/Systems Architect, InterSystems
  • Dipti Borkar, V.P. of Products, Kinetica
  • Adnan Mahmud, Founder and CEO, LiveStories
  • Jack Norris, S.V.P. Data and Applications, MapR
  • Derek Smith, Co-founder and CEO, Naveego
  • Ken Tsai, Global V.P., Global Vice President, Head of Database and Data Management Product Marketing, SAP
  • Clarke Patterson, Head of Product Marketing, StreamSets
  • Seeta Somagani, Solutions Architect, VoltDB
  • Topics:
    big data, data governance, data strategy, speed

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}