Over a million developers have joined DZone.

What Scares You About Big Data?

DZone's Guide to

What Scares You About Big Data?

Security is becoming more important as we become more reliant on data and the data becomes more distributed.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "What are your biggest concerns regarding the state of big data today?" Here's what they told us.


  • Security becomes more important as we become more reliant on big data and it's more distributed. A higher percentage of the data becomes critical. 
  • Security in light of recent breaches. These are a function of human failure to follow best practices.
  • Standards have not fully emerged. Still a bit of the wild west with new approaches and technologies.
  • A lot of education to be done. What the problem is and what the right solution is for the problem at hand. We’re a long way from ever having one general purpose database. Know what the technology is good for as well as for what it is not good for. Security is a big deal but it cannot be regulated. Consumers may stop doing business with companies who are deemed to be unethical. This is rapidly evolving. The biggest concerns center the ethical use of big data and the potential for misuse. Big data could be abused too easily by dictators and criminals with potentially nefarious consequences for very large-scale damage on society. Examples abound nowadays, with questions around the influence of certain government-sponsored actions and democratic elections in certain countries, the abuse of big data to control large swaths of the population in dictatorial regimes and the unethical use of big data by corporations to affect consumer decisions.


  • Ethics without forethought of the implications of the data on the individual. Erroneous use and behaviors. Never be completely dependent on AI/ML for decision-making. Must keep data secure.


  • Adoption has been slow. Seems like we’re two years behind and just getting started. 
  • The hype and expectations are not in alignment with the complexity and reality. 
  • Haven’t seen the speed of materialization as promised.


  • The ecosystem is complex and messy, and therefore hard to learn. Alluxio with its partners. The data is stored in various storage systems and hard to manage and retrieve. The applications are hard to replicate. Application vendors and cloud vendors. There are many types of environments, and it's hard for vendors to make their software run smoothly in different environments.


  • Not realizing the possibilities to add value to your business with cognitive and AI/ML. Companies relying on current information governance and intelligence archives rather than being flexible. AI/ML will make your current structure obsolete. Retrieval of data is painful — AI/ML makes it easy.
  • The technology has matured. There are many viable options: open source and proprietary. It’s easy to fall into the trap of cluster sprawl and building another monolith. You are not limited by design and will go back to Conroy’s Law if you’re not careful.
  • Nothing is going to stop data. It’s the primary asset and it will continue to grow. The tech stack will continue to innovate to keep up with the growth of data.
  • Developers are building applications within self-contained teams and need infrastructure that supports this model to make the process as efficient as possible. Orchestration platforms like Kubernetes along with container technologies help to enable this.
  • Customer experience is now the most important digital initiative, followed by building a single customer data view and customer journey management. Enterprises are indeed switching their focus from internal resource management to external customer experience. Enterprises now perceive their customers and their ecosystem to be a key source of co-innovation rather than their traditional internally focused R&D organizations. This implies big data to be more collaborative than ever by encouraging participation, sharing of data, and co-innovation. But too often, businesses get in their own way by refusing to create a culture around data and not prioritizing the proper funding and staffing for data management. Also, it's still a real challenge to create a trusted environment with the enterprise's ecosystem in order to capture valuable data from partners, customers, or other stakeholders to improve the customer journey. In a customer experience network, the synergistic value created by the network is greater than the sum of its parts. It provides enterprises with the means to “crack the code” to deliver superior customer experiences.
  • Slow, manual, one-off efforts that are discarded and require rework — an AI-driven data platform that allows you to reuse the rules, policies, and business logic that cleanses, masters, governs, and secures data ensures that each iteration of your big data effort can leverage the learnings and investments from the prior. Too much time spent finding data — an AI-driven enterprise data catalog automatically discovers data assets, minimizing this challenge. No common authoritative set of data assets for everyone to use; ensuring the data in your data lake is cleansed, mastered, and governed confirms it’s certified “fit for use” across the organization. Preparing and cleaning data takes weeks, leaving insufficient time for analytics. AI-driven data prep capabilities fast-track analysts’ ability to get insights from raw data. Data lakes become data swamps from data that is inaccurate, incomplete, and without context — a governed data lake ensures the data remains fit for use.

Here’s who we spoke to:

  • Emma McGrattan, S.V.P. of Engineering, Actian
  • Neena Pemmaraju, VP, Products, Alluxio Inc.
  • Tibi Popp, Co-founder and CTO, Archive360
  • Laura Pressman, Marketing Manager, Automated Insights
  • Sébastien Vugier, SVP, Ecosystem Engagement & Vertical Solutions, Axway
  • Kostas Tzoumas, Co-founder and CEO, Data Artisans
  • Shehan Akmeemana, CTO, Data Dynamics
  • Peter Smails, V.P. of Marketing and Business Development, Datos IO
  • Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
  • Ali Hodroj, Vice President Products and Strategy, GigaSpaces
  • Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
  • Fangjin Yang, Co-founder and CEO, Imply
  • Murthy Mathiprakasam, Director of Product Marketing, Informatica
  • Iran Hutchinson, Product Manager & Big Data Analytics Software/Systems Architect, InterSystems
  • Dipti Borkar, V.P. of Products, Kinetica
  • Adnan Mahmud, Founder and CEO, LiveStories
  • Jack Norris, S.V.P. Data and Applications, MapR
  • Derek Smith, Co-founder and CEO, Naveego
  • Ken Tsai, Global V.P., Head of Cloud Platform and Data Management, SAP
  • Clarke Patterson, Head of Product Marketing, StreamSets
  • Seeta Somagani, Solutions Architect, VoltDB
  • Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

    big data ,data analytics ,ethics ,data security

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}