What Are the Keys to Big Data?
What Are the Keys to Big Data?
4 keys to having a successful big data strategy are: know the business problem you're trying to solve; governance and operations;strategy and structure; and speed of delivery.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "What are the keys to a successful big data strategy?" Here's what they told us:
ID the Business Problem
- Understand the problem you’re trying to solve. Frequently, that’s the complexity and scale of the datasets. We use OLAP to garner insights. Rapid aggregation of large datasets or streams of data. Help take large streams of data and ingest so you can slice and dice to gather insights.
- It’s not actually about the big data itself. Firstly, it’s about identifying the current processes or actions that are present in your business (or project). By identifying, I explicitly mean document — using BPMN (business process model and notation), UML, word processor, whiteboards, chalkboards, cameras on phones, napkins, etc. This crucial step is required to enable, if not force, all technology decisions (selection and implementation) to benefit and be driven by the needs of the business. When approached in this manner several benefits emerge that not only help identify where to employ a big data analytic approach but how to measure improvements:
- The individual steps needed to achieve the business goals via one or more processes are revealed to all stakeholders.
- The constraints (inputs, outputs, allotted time) required when executing each step are also identified. For instance, Step 1 may need to complete in 3 seconds.
- The data required for each step, process, and business goals are identified. Importantly, the data is identified within the scope of the business needs. (We want to avoid selecting technology [acquiring or building] for any other reason. The battlefield of business history is littered with efforts that selected technology because it was popular, cool, familiar, low or no cost or a resume builder.)
Governance and Operations
Strategy and Structure
- We expected specialized technologies, new opportunities to do distributed computing, now we’re seeing organizations with flat IT budgets looking to take costs out of IT so they can focus on innovation. Data volumes are growing. Need to focus on the process versus the data. Understand that data is the enabling layer but also the obstacle to pursuing microservices, AI, and IoT. What is the data fabric strategy you are going to use to pursue new technologies — multi-cloud, hybrid processes, and microservices.
- From my experience, it is imperative to develop a strategy incrementally, starting with key use cases that can benefit from big data technology before moving to broad, organization-wide projects. Beyond that, a successful strategy involves getting the right people in place that understand both the technology spectrum and the business goals and picking the right architecture. Data streaming technology can be a huge help in creating an architecture that connects different data silos, data lakes, etc. Also important is putting the right governance policies in place for making data accessible across an organization.
- The most successful organizations who have adopted a big data strategy reported these observations: The use of a unified, comprehensive and flexible data management platform enables speed, reusability, and trust with far less manpower than traditionally manual and complex approaches. The use of a unified, comprehensive and flexible data management platform also enables organizations to focus manpower on business logic and business context, and not be delayed by the complexities of an ever-changing infrastructure ecosystem. The ability to leverage AI-driven technology to guide user behavior and automate processes.
Speed of Delivery
- Real-time analytics and fast data processing. Don’t fall into cluster sprawl. Complexity can be high. Leverage Spark to innovate on big data in a more concise way. Simplify architecture and performance. Focus on high performance so you can get answers quickly.
- A work in progress that’s not easily solved. Connect quickly with a self-service model to empower business analysts and data scientists to be self-sufficient and independent.
- Pull together data sources to answer questions. Don’t get stuck using a single system. Leave the data where it lives naturally and do the analytics there. If you bring the data to a data warehouse, it’s out of date. Answer questions quickly. Run different “what if” scenarios and respond to questions in real0time.
- Back-up, recovery, and protection especially with the growth of ransomware. Data is business critical. Treat it as such.
- The more metadata you have the more it can work for you.
- Data operations ensuring data is moving across the enterprise while you are able to keep your finger on where it is and what it’s being used for. Operational oversight, quality, and SLAs. Big data can be difficult for companies to use. Kafka is complex and can be difficult to get started. We provide a UI that removes the requirement for upfront programming.
- Municipalities have been collecting data for a long time. That data has been collecting dust. We provide the resources for non-technical people to clean and work with the data.
Here’s who we spoke to:
Opinions expressed by DZone contributors are their own.