2019 Executive Insights on Big Data
We spoke with experts and executives from all across the software industry to get their take on big data and what the future has in store.
Join the DZone community and get the full member experience.
Join For FreeThis article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!
To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. Here's who we spoke to:
Cheryl Martin, V.P. Research Chief Data Scientist, Alegion
Adam Smith, COO, Automated Insights
Amy O'Connor, Chief Data and Information Officer, Cloudera
Colin Britton, Chief Strategy Officer, Devo
Alan Weintraub, Office of the CTO, DocAuthority
Kelly Stirman, CMO and V.P. of Strategy, Dremio
Dennis Duckworth, Director of Product Marketing, Fauna
Nikita Ivanov, Founder and CTO, GridGain Systems
Tom Zawacki, Chief Digital Officer, Infogroup
Ramesh Menon, Vice President, Product, Infoworks
Ben Slater, Chief Product Officer, Instaclustr
Jeff Fried, Director of Product Management, InterSystems
Ilya Pupko, Chief Architect, Jitterbit
Bob Hollander, Senior Vice President, Services and Business Development, InterVision
Rosaria Silipo, Principal Data Scientist, and Tobias Koetter, Big DataManager and Head of Berlin Office, KNIME
Bill Peterson, V.P. Industry Solutions, MapR
Jeff Healey, Vertica Product Marketing, Micro Focus
Derek Smith, CTO and Co-founder and Katie Horvath, CEO, Naveego
Michael LaFleur, Global Head of Solution Architecture, Provenir
Stephen Blum, CTO, PubNub
Scott Parker, Director of Product Marketing, Sinequa
Clarke Patterson, Head of Product Marketing, StreamSets
Yu Xu, Founder and CEO, and Todd Blaschka, CTO, TigerGraph
Bala Venkatrao, V.P. of Product, Unravel Data
Madhup Mishra, V.P. of Product Marketing, VoltDB
Alex Gorelik, Founder and CTO, Waterline Data
Key Findings
1. While there were more than two dozen elements identified as being important for successful big data initiatives, identifying the use case, having quality data, and having the right tools were mentioned most frequently. Choose the right project — a well-defined problem that’s a pain point. Define the most critical use cases and identify where data is holding you back from solving those use cases. Have a clear set of goals and know the business decisions you are trying to drive.
Have reliable and valid data since the level of trust in your work will be a function of the level of reliability and the level of use. Data accuracy and correctness are critical for every data project. Inventory data sources and assess data quality. The ability for the data-driven organization to take action with complete accuracy by relying on a purpose-built, high-performance, open data analytical platform. Without the highest level of data accuracy and integrity, analysis and targeting will not be effective.
Leverage tooling to simplify data processing and analysis and to make more progress faster. Have the right tools to ingest, transform, analyze, and visualize the results. It’s important to have the flexibility to look at the data using multiple tools and data models.
2. The two most popular ways to secure data are encryption and controlling authorization and access. We encrypt data when transferred and store encrypted data on disk. We never have unencrypted data so it’s never at risk. Data is encrypted in transit and at rest using industry standard encryption ciphers. Self-encrypted disk drives can be used on database servers for that level of protection within the data centers themselves.
Control data access so only users with proper permissions have access. Enterprise authorization and authentication frameworks enforce that.
3. The most frequently mentioned languages, tools, and frameworks were Python, Spark, and Kafka. TensorFlow, Tableau, and PowerBI were also mentioned by several respondents.
4. Use cases span a dozen industries and use cases with the most frequently mentioned industries being financial services, retail, and healthcare, and the most frequent use cases were around security/fraud prevention and customer insight/understanding. Fraud is a major issue, and big data is being used for anomaly detection in healthcare and financial services. Financial services is a predominant industry because money is involved. Look at large credit customers for fraud detection with two millisecond response time.
Financial services companies have new regulations with real-time reporting requirements. The Fundamental Review of the Trading Book (FRTB) regulations require financial services companies to calculate their portfolio value and risk exposure in real-time.
5. Failure of big data initiatives are a function of lack of security, skills, definition of the business problem, and inability to scale. If you don’t know the lineage of the data and cannot ensure its secure, you are asking for trouble. Security needs to enforce policy. Do not put security policy definitions in the hands of developers. Start by standardizing data governance, security, and access control.
The biggest challenge of big data initiatives, like all data analytics projects, is the recruitment of qualified employees. Not having the people with the right skills can lead to a complex path or failure. Know what your people are capable of. Organizations don’t have the technical expertise or engineering capacity to keep up with all the changes today’s data-driven economy require.
Figure out the problem to be solved before deciding on the technology you choose. Have a clear business objective. Not having clear, precise goals for any data project is a common failure. Organization don’t spend the time to understand, categorize, and tag their information. People just take shortcuts and dump or keep information en masse. They are mismanaging the process because they do not know what they have.
You need to understand how the technology scales. To achieve scalability, you will need to build your application a certain way. The challenges a lot of projects have is the difficulty of testing scale, volume, and variety with a PLC. Underestimating demand for data and the challenges of trying to scale up a compromised architecture after the fact to deal with larger demand and a broader user basis is a common failure.
6. Concerns regarding the state of big data revolve around security and governance, data quality, the amount of data, and the need to have a specific business case. There are huge security challenges to moving so much data around — fake data generation, insider attacks, and API vulnerabilities. Employees often have access to data they should not have access to which enhance the human error factor. Internal breaches are more common and worrisome than external ones.
There’s not enough emphasis on data quality and contextual relevance. We need to think about the lifecycle of information for quality, proper governance, and enforcement of governance. The rate and new sources of data is growing. Be forward thinking about the business case for the data. The biggest challenge for big data today is identifying how we will derive value from the data fast enough to inform real-time decision making.
7. The future of big data is artificial intelligence (AI) and machine learning (ML) along with streaming data and more mature toolsets. We’ll see higher adoption of AI/ML using it to filter through data and enabling more people to get involved with data science. AI/ML is becoming less hype and more of a trend. Big data is not very useful by itself. The use of AI/ML technologies like TensorFlow provide great opportunities uncovering pattern a human cannot see. AI/ML will focus on making sensible answers for people.
We’ll see the continued emergence of streaming, always-on technology. More tools for visualizing and reporting on big data. More mature tools with the ability to handle more data, more data types, and streaming data will arise quickly.
8. The primary thing developers need to keep in mind when working on big data projects is the business problem they are trying to solve and how what they are doing will add value to the business and improve the user experience of the customer. Focus on what matters and partner with business to solve problems. Think about the business context of what you are doing. Understand the regulations and constraints around the data. Understand the business outcome you’re working on and identify the business partner to help realize the value.
Developers need to focus on how they can provide value to their specific business in response to their particular industry rather than spending all of their time trying to build functionality they can get from the market.
This article is featured in the new DZone Guide to Big Data: Volume, Variety, and Velocity. Get your free copy for insightful articles, industry stats, and more!
Opinions expressed by DZone contributors are their own.
Comments