Join the DZone community and get the full member experience.
Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
To gather insights on the state of Big Data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "How can companies get more out of big data?" Here's what they told us:
Deliver Value Quickly
- The right data fabric allows you to be strategic and tactical at the same time. The right architecture can help you pursue low hanging fruit and get tremendous returns. Don’t consider data an asset. Recognize data resource at its highest use and ingest into business operations to take more intelligent action to drive topline revenue growth. Requires real-time operations, analytics and transactional integration.
- Make data immediately available and actionable. Instead of dumping all data in a data lake and trying to make use of it later (when the data is no longer as valuable), data streaming technology offers an architecture in which useful applications can be built directly on the data streams, acting on data as they arrive from all kinds of sources. That way, data applications are decentralized instead of being dependent on a central authority.
- By democratizing the data throughout their organization. Creating dashboards for different departments that have clear, understandable takeaways (this is where natural language generation is especially useful) regardless of an individual’s data analysis skill level.
- Don’t build a data warehouse — that’s behind the curve. Reverse the perception by enabling clients to get real-time results quickly. Build apps to scale to hundreds of terabytes. The impact of scalability more education in designing apps for big data scale. Understand the underlying data.
- Companies can place greater, near-term priority on carrying out investigations with business models that deepens market segments and geographies, application of big data and analytics in crafting predictive analytics and intelligent practices, and a switch in their perspective from "data as an asset" to offering "data products and services" they can monetize. This means investments in ways to:
- Experiment with data-driven business models to convert data into services.
- Analyze data and ecosystem trends to understand service demand, quality, performance, usage, and engagement.
- Monetize data to generate new revenue streams and growth opportunities.
- For goals, projects, or objectives that are existing, document how they currently work if documentation doesn’t already exist. Initiate documentation for new goals, projects, or objectives. In each case identify the time requirements for each step and the overall goals that determine success or completion. Generally, these time requirements involve the acquiring and processing of data, system or user interaction, and various forms of computation. (Big data Analytics is about processing more data in the time allotted where we make decisions.) Design and select technology that meets or exceeds the metrics identified. Use this data-driven decision-making power to drive, monitor, and augment all areas of business the company participates in.
- Plenty of advanced research and developments continue to happen in ML and AI to get better intelligence from data and will likely continue to make rapid advancements to help companies get more, but that’s not really my area of expertise. Fast intelligence — putting your intelligence to use at the time of the decision in real-time when responding to millions of events per second is what I believe can make a categorical difference in the output from companies' big data investment.
- The most successful organizations who have adopted a big data strategy reported these observations:
- The use of automation to speed up development, delivery, and maintenance of data processes.
- The use of middleware fabrics to abstract the business logic and business context of big data projects from the underlying execution.
- The use of machine-learning based technologies to discover and classify data assets and infer relationships between them.
- The use of data quality, data masking, data lineage, and other data management processes to certify the trustworthiness of data assets.
Use the Cloud and New Toolsets
- Take your data, drop it into Azure. This provides a way of analyzing so you can react to what the data is telling you. Use Azure for search, AI/ML, and cognitive. If the data in the documents meet your criteria, it’s flagged for closer inspection. We’re able to perform real-time analytics and flag a document if it’s violating a specific policy.
- Leverage the tools that simplify the process. People have a vision for the platform but then realize how hard it is to execute on the vision, the time and people required. Hadoop alone is 25 different projects.
- Make use of the public cloud platforms and the tools they provide. A lot of the offerings from AWS, Azure, and GCP remove the complexity enabling companies to get value from their data rather than getting stuck.
- It takes time. Adjust expectations accordingly.
- The data lake is morphing beyond SQL. The technology to operationalize is challenging. Look at use cases where customers see value. Data refinery process from raw to candidate and data warehouse — very siloed.
- How to share data with orchestration, automation, lineage, and analytics. Conversion of data lakes into more sophisticated data warehouses and data fabrics with more integrated technology with a separate persistent layer. From Google, AWS, and Redshift bypass Hadoop to achieve outcomes.
- Open Source big data community toolsets — TensorFlow, Spark, and new innovations. More awareness — enterprise data will reside in zones to take advantage of silos.
- Unfulfilled promises are the result of the hype. You can get the right ROI if you know what you’re doing. You need to identify the right talent. You need a big data strategist who knows the techniques and how to apply them. Identify the value in the data but unable to extract it. May need to look for talent outside or bring in someone to train your employees. The first step is always not being afraid of the change and daring to put some investment in the initiative. Fortunately, because several big data platforms are free and open source and the use of public clouds such as Amazon Web Services and Azure are very inexpensive, the amount of initial investment necessary is fairly limited and possibly represents mostly the time of a data analyst (or a data scientist) to “play” with the data to identify novel ways to utilize it. From this initial setup to a Minimum Viable Product there could be non-linear path, since certain applications of data could require additional input from internal or external counsel to ensure that they don’t infringe on existing laws and/or overstep the boundaries of regulatory frameworks and, most important, that they don’t have ethical implications that could create significant problems for the brand and/or the organization down the road.
ID the Business Problem
- Technology is offering a lot in regard to compute and storage. Get your hands on the data and figure out what you want to accomplish. Define a business problem to solve. Begin with the end in mind. Start small with a clear desire for the outcome.
- Processing records quickly. Converging unstructured data. Look at the analytics value chain. Think with the end in mind. What problem do you want to solve? Quit focusing on collecting data and think about the problems you are trying to solve.
- Manage and store but do it with real business need in mind. The needs vary by vertical and industry. Retail is looking for next best offer analysis. Ad tech is looking for programmatic ad matching. Efficiency improvements in energy and manufacturing. Anomaly detection in financial services for fraud prevention.
- Have data on your data. Look at problems more holistically and think about the problems you want to solve. Build one platform to solve the problems progressively. Do not build five platforms to solve five problems.
- Empower people throughout the organization to have access to the data they need to make informed decisions.
- You must get your infrastructure in place. To get real repeatable value takes time to set up the hardware steps involved.
- Get data into the digital format as an easy way to do analysis. How are things changing over time? How do my conditions compare with my neighbor or other municipalities my size?
- Companies can get more out of big data by successfully and efficiently analyzing data from different sources. The major breakthroughs in big data for companies started with Apache Hadoop more than a decade ago, bringing a much lower cost for storing and analyzing information, but typically in batch mode. Next came real-time processing of big data led by Apache Spark technology. Today the key to maximizing the value in big data is having a 360-degree view of a customer to serve them better. However, the information about the customer could be stored in many different storage systems (sales/customer service/shipping/payments), which makes it difficult to do this kind of analytics today. The challenge companies face now is how to unify and process big data in real-time from disparate data stores.
Here’s who we spoke to:
Emma McGrattan, S.V.P. of Engineering, Actian
Neena Pemmaraju, VP, Products, Alluxio, Inc.
Tibi Popp, Co-founder and CTO, Archive360
Laura Pressman, Marketing Manager, Automated Insights
Sébastien Vugier, SVP, Ecosystem Engagement and Vertical Solutions, Axway
Kostas Tzoumas, Co-founder and CEO, Data Artisans
Shehan Akmeemana, CTO, Data Dynamics
Peter Smails, V.P. of Marketing and Business Development, Datos IO
Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
Ali Hodroj, Vice President Products and Strategy, GigaSpaces
Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
Fangjin Yang, Co-founder and CEO, Imply
Murthy Mathiprakasam, Director of Product Marketing, Informatica
Iran Hutchinson, Product Manager and Big Data Analytics Software/Systems Architect, InterSystems
Dipti Borkar, V.P. of Products, Kinetica
Adnan Mahmud, Founder and CEO, LiveStories
Jack Norris, S.V.P. Data and Applications, MapR
Derek Smith, Co-founder and CEO, Naveego
Ken Tsai, Global V.P., Head of Cloud Platform and Data Management, SAP
Clarke Patterson, Head of Product Marketing, StreamSets
Seeta Somagani, Solutions Architect, VoltDB
Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.
Opinions expressed by DZone contributors are their own.