Join the DZone community and get the full member experience.
Join For Free
The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.
To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "What’s the future of big data ingestion, management, and analysis from your perspective – where do the greatest opportunities lie?" Here's what they told us:
- We'll see the transition from on-prem to the cloud and subsequently see traditional Hadoop make the transition to the cloud. This will lead to higher adoption of AI/ML.
- Just drive the digitization agenda of the company. You have sufficient compute power and data – what could you do? Take advantage of the capability. Use AI/ML to filter through the data. Enable more people to get involved.
- Leverage big data and ML anomaly detection with more sensors entering the world. Cameras checking on safety helmets, ML models from city sensors early warning indicators. The entire economy becomes information driven. Understand why anomalies might happen.
- 1) AI/ML becoming less hype and more of a trend. ML needs big data to work. Any ML requires big data. Big data is not useful by itself. Ability to have an engine automatically see trends and make suggestions on what to look at is valuable. 2) Expect more tools for visualizing and reporting on big data. Salesforce has Einstein. Tableau has a tool. Expect thousands more we haven’t seen yet. AI/ML will become more prevalent.
- AI protected systems. Maintain and keep the data safer. Create ethical and moral dilemmas for humans. Protect the data because at some point it will be turned over to machines which is terrifying because you don’t know what the machine may do with it and you cannot recover.
- The use of AI and ML technologies, like TensorFlow, providing the greatest possible future opportunities for big data applications. With AI the computer uncovers patterns that a human is unable to see.
- We’re going to suffer a talent problem in organizations. Ability to make the value of the data visible to people who are not data scientists is an important factor to deal with. AI/ML will focus on making sense of data to provide answers to people. Context is also important — how can we create context and get it out of people’s heads?
- Maturing past the Hadoop data lake world. Hadoop is a workload good for some things and not good for everything. Everyone is taking a deep breath. Hadoop is good for these things. THe same is true for the data lake. You have to go through the growing pains to figure it out. Opportunity increases as we get more into the world of AI and the system is finding things in the future, that’s the reality, we’ll get there as an industry. Huge opportunity to do across data and workloads. You have to scope that. Some use cases and workloads.
- Streaming for more real-time, faster ingestion, and analysis. Still in the early days of animated actions with the data.
- How to connect real-time data to get the most out of big data. Connect data and dots to explore and predict relationships.
- Recognition about the variety of data. Rationalize across all the different kinds of data. Aggregation of variables – credit bureau, core banking system, data on Hadoop. There's an opportunity with the proliferation of tools you can put in the hands of the data analysts or business users rather than relying on data governance or DBAs. Give people access to the data and the tools to manipulate.
- More maturity and more tools with the ability to interpret. 1) More data, more types, streaming more quickly. 2) Analytical methods used to process the data. 3) Automation of an insight.
- The trend to make the common denominator across systems more SQL-centric through an API. SQL is how devs interact with data across different systems. Move to more open source and lower cost tooling as a visualization step. The difference between Power BI and Tableau is shrinking. Data-as-a-service makes tools for visualization less critical. Increasing role of the data steward bridge between analyst and data consumer to be more self-sufficient.
- There is a continued drive for standardization of data ingestion, with many companies looking to Kafka for on-premises or private cloud or Kinesis for AWS cloud. Data management and analytic tools then become sinks and sources for data for those data movement frameworks, which creates a sort of data utility grid for these companies, sort of like the electrical system in a house. If you need electricity, you just need an appliance with a standard plug and you plug in. The same is occurring with data access — and is already in place at some companies — if you need to get use of data or provide data to someone else, you just plug your application into the data grid using their standard “plug” (or interface). This will also allow for more “best of breed” components to be used, like the best BI or analytics tool or best database for a particular workload rather than having to compromise on an inferior all-in-one product since the data integration will be more standard than custom. Localization of data is a great opportunity, too. That is, having data located in the world where it is needed rather than needed to traverse long networks in order to retrieve it, process it, change it, or analyze it. That means more master-master, active-active architectures which can create application challenges for any enterprise, so the right choice of components will be important.
- Leading companies are increasingly standardizing on mature open source technologies like Apache Kafka, Apache Ignite, and Apache Spark to ingest, manage and analyze their big data. All of these projects have experienced major adoption growth in the past few years and it appears likely this will continue for the foreseeable future. As these technologies mature and become increasingly easier to install and use, they will create opportunities for those who know how to use and implement distributed computing technologies for an increasingly real-time world.
- Look at tagging, get the proper metadata models, ensure the context of the information. Tags and metadata draw context. Ensure proper metadata is wrapped around. Have traceability for reliability.
- Focus on operationalization driven by the continued emergence of streaming always-on technology. Complete understanding of what’s going on. The cloud drives this home where cloud-based application architectures are always on and being updated. The same needs to happen with data architectures with automation. Customers see themselves going down a data operations path.
- All three parts of big data can lead to a considerably successful project in terms of ROI as well as data governance. I would order them hierarchically. First, we need to be able to collect data in large amounts from many different sources. Once the data becomes available, the proper management, like the proper creation of informative KPIs, might already lead to some unexpected discoveries. Finally, after the data have been so transformed, their analysis produces even further insights that are vital for the company business. So, as you see, you already get information from step 1. But you can get step 2 without having completed step 1 first.
- This will all become easier. Things that are challenging today will become second nature and automated in the future. See ease of accessing big data just as easy as anything we do on a computer. Handling, moving, connecting will have far less friction. Using big data identifying the value proposition within the data is where the opportunities lie within each business.
- Augmented analytics pulling together natural language, data, and analytics to drive answers. How do we get to analyzing based on identifying what you don’t know to query?
- Data analysts and scientists don’t care where the data is, they just want the data and the tools they need to analyze it. Catalog and know where the data is. Next step just want data where I want it. Build a virtual catalog to access delivery. There’s a logical progression of what we're doing.
- Regardless of on-prem or cloud needs companies to ensure the engine keeps working so you can get value. As a service model doesn’t automatically solve the problems. Need to know and manage performance problems. Bring performance transparency. Think through security from end-to-end.
- The future is big data analytical platforms that provide proven capabilities for ingestion, management, and analysis at the speed and scale that enables businesses to compete and win. The greatest opportunities are for businesses to no longer be constrained by the imagination of the business in getting accurate insight so that they can act on all opportunities – understand exactly which customers are likely to churn and grow your business, establish entirely new business models based on the revenue-generating capabilities of data (think pay-as-you-drive insurance, as an example). Every single industry can differentiate itself based on the valuable insight of the data. Make an investment in a proven data analytical platform that offers no compromises and prepares you for whatever the future holds in terms of deployment models (clouds, on-premises, on Hadoop, etc.) or seamless integration with emerging technologies in the data ecosystem.
- The greatest opportunities lie in delivering true agile data engineering processes that allow companies to quickly create data pipelines to answer new business questions without requiring business people to depend on IT. This requires the automation of the end-to-end development, operationalization, and ongoing governance of big data environments in an integrated fashion. The key to success is automating away the complexity so organizations can use people with basic SQL and data management skills to fully leverage big data for competitive advantage.
- There is a very bright future ahead for all of these. One area of great opportunity is in the IoT arena. There are over 9 billion devices deployed and the rate of deployment is speeding up as the cost of devices decreases and the sophistication of devices increases. This device data requires very high-speed ingestion and robust management. It is also ripe for advanced analytics such as machine learning for outlier detection.
- We see three mission-critical opportunities in the future of data-driven marketing and sales. 1) Cord-Cutters — Our clients’ customers are more mobile and digital than ever. Traditional data elements and IDs such as home phone, home address, business extension, etc. have to be complemented with digital IDs such as mobile phone number, GPS coordinates, cookie ID, device ID, MAIDs, etc. 2) Predictive World — Artificial intelligence is woven throughout our everyday lives and experiences. Our phones predict the next few words in the sentence we are texting. Our thermostats predict what temperature is optimal for personal warmth and cost savings. Our cars brake for us before an accident happens. Consumers now expect marketing and sales experiences will also be predictive, using data and intelligence to improve their brand experiences in real-time. 3) B2B2C Life — There is a blending of our business and consumer selves. Research shows that approximately 43% of consumer work remotely and the number of people that spend > 50% of their time working at home has grown 115% over the past 10 years. Therefore, marketers must be able to connect the data IDs, attributes and behaviors of individuals versus siloed B2B or B2C targeting.
Here’s who we spoke to:
- Cheryl Martin, V.P. Research Chief Data Scientist, Alegion
- Adam Smith, COO, Automated Insights
- Amy O’Connor, Chief Data and Information Officer, Cloudera
- Colin Britton, Chief Strategy Officer, Devo
- OJ Ngo, CTO and Co-founder, DH2i
- Alan Weintraub, Office of the CTO, DocAuthority
- Kelly Stirman, CMO and V.P. of Strategy, Dremio
- Dennis Duckworth, Director of Product Marketing, Fauna
- Nikita Ivanov, founder and CTO, GridGain Systems
- Tom Zawacki, Chief Digital Officer, Infogroup
- Ramesh Menon, Vice President, Product, Infoworks
- Ben Slater, Chief Product Officer, Instaclustr
- Jeff Fried, Director of Product Management, InterSystems
- Bob Hollander, Senior Vice President, Services & Business Development, InterVision
- Ilya Pupko, Chief Architect, Jitterbit
- Rosaria Silipo, Principal Data Scientist and Tobias Koetter, Big Data Manager and Head of Berlin Office, KNIME
- Bill Peterson, V.P. Industry Solutions, MapR
- Jeff Healey, Vertica Product Marketing, Micro Focus
- Derek Smith, CTO and Co-founder and Katie Horvath, CEO, Naveego
- Michael LaFleur, Global Head of Solution Architecture, Provenir
- Stephen Blum, CTO, PubNub
- Scott Parker, Director of Product Marketing, Sinequa
- Clarke Patterson, Head of Product Marketing, StreamSets
- Bob Eve, Senior Director, TIBCO
- Yu Xu, Founder and CEO, and Todd Blaschka, CTO, TigerGraph
- Bala Venkatrao, V.P. of Product, Unravel
- Madhup Mishra, VP of Product Marketing, VoltDB
- Alex Gorelik, Founder and CTO, Waterline Data
Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.
,big data analytics
Opinions expressed by DZone contributors are their own.