Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Concerns With Big Data

DZone's Guide to

Concerns With Big Data

Concerns regarding the state of big data revolve around security, privacy, quality, volume, and the business problem to be solved.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "Do you have any concerns regarding the state of big data?" Here's what they told us:

Security

  • The whole approach brings security challenges moving data around. Fake data generation. Insider attacks. API vulnerabilities. 
  • I worry about internal failure more than external. Employees have access to data they should not have access to. Human error factor. Humans create holes in the process. Not well-trained or complacent. 
  • Security and privacy. Physical or virtual data lake has a lot of very important thing.

Quality

  • Not enough emphasis on quality and contextual relevance. The trend with technology is collecting more raw data closer to the end user. The danger is data in raw format has quality issues. Reducing the gap between the end user and raw data increases issues in data quality. It's great the middle is being streamlined, but the raw data has a quality issue. Maintain focus on quality data. Once you start handing over processing to AI/ML, you need an understanding of the data. The importance of the data becomes more important from quality, format, and context.
  • The lifecycle of information for quality and proper governance and enforcement of governance. Properly approved; what’s a record? How can we manage compliance perspectives in new records? Reliability, quality, and compliance equals governance.
  • As analytics speeds up, there is a need for faster access to data. Humans are starting to be removed from the process. Where is the oversight? How do we know the data being used to drive analytics and operations should be used? How do we know that the algorithms are proper and ethical and unbiased and that they are continuing to execute in those ways? What happens when “bad data” gets into the system, even accidentally? Will it be discovered and rejected, or will it be processed with all resulting actions being tainted? Those are some concerns for where we are with big data right now and issues that need to be addressed.
  • Data Integrity. Ensuring error-free, or “clean,” data from reliable sources must be a priority for data providers and our clients. Data with low integrity compromises the accuracy of business analytics and intelligence. The lower the accuracy, the less effective targeting and conversion of the right audience, and the risk of decreased customer satisfaction. 

Amount of Data

  • Looking at what you can do with fresh data and how to apply with fresh data. The rate of new data is growing, how can apply that to what we are currently doing? One foot is on the path and one is in the future. How can we use new data to innovate? Also, forward thinking about the business case for the data. Executives struggle with answering the question of what they want to do with the data, i.e., how to make use of the data in a good way.
  • I believe data can make a huge difference for companies and humans. There’s just too much of it. Billions of fields. We must document the data to be able to get value from it. Data is beyond the ability to manage and make sense of it for humans. You end up with unpredictable results and famous failures. Prevent failure by getting the plumbing in place so the data is usable.

Business Case

  • More concerned with the hyperbole around AI/ML. Need to get back to solving problems and creating value. General big data has been through the arc, AI/ML is now in it. Need to create value from data.
  • The biggest challenge for big data today is often how to derive value from the data fast enough to drive real-time decision making. This is one of the reasons we are seeing a high rate of growth in adoption of in-memory computing solutions which provide the speed and scalability companies need to achieve their big data goals.
  • One concern is market disillusionment. There has been so much hype about big data that some organizations have unrealistic expectations, and as the hype has morphed into hype about machine learning and AI, there is a risk that projects will lose their mandate or that failed projects will cause a backlash. This is particularly true with data lake initiatives, which too often start without a clear application in mind and become data swamps that don’t visibly deliver value.

Other

  • The most interesting thing is the ongoing conversation around commercial roles and open source. The industry is not settled on the best way to go. See varieties of open core and support contracts. AWS is taking over open source and providing as a service. What’s the model to allow commercial entities to generate revenue while contributing back?
  • I personally worry about the ethical treatment of data. We're till in a mode where we're eager to get our arms around everything versus looking at the long-term implications of how data is being used. What’s acceptable and what’s not? Businesses are where some of these acquisitions in open source are going – Red Hat, Cloudera — how does the platform space evolve from there? At the end of the day, big data as a concept survives. How it is implemented is likely to change.
  • Just a few days ago, we got news of the merger between two historical players in the field of big data. The access to this field of historical players in cloud technology might bring some changes to current big data technology, such as the trend toward hosted big data frameworks like Amazon EMR or Azure HDInsight instead of on-premise data centers.
  • AI is used too often. There needs to be human involvement in defining the problem, interpreting the result, and applying the result.
  • As companies move to cloud services that abstract complexity away, cost can get out of control. Can get stuck and not extract from the service.
  • People who know how are using it effectively. Have the right people doing infrastructure right. Smaller customers don’t have the tools or the infrastructure. Going to the cloud service model. Need sophistication and tooling to get the level of performance they were expecting. Make sure technology is relevant for on-prem and cloud use cases.
  • The state of big data is under flux with a lot of experimentation going on. Here are my top three concerns with where it’s headed: 1) Collapsing of the Hadoop market - While touted as a silver bullet that offers an economic solution to big data, Hadoop hasn’t lived up to its hype and we see all the vendors pivoting to AI and ML next. 2) Buzzword Bingo – Another concern I have regarding big data is that all solutions sound the same. One of the things I keep hearing from our customers is that they need to try things before they buy. They see the “buzzword bingo” played with so many big data vendors that they won’t trust any of them going forward. 3) NoSQL not living up to its hype – NoSQL claims to address web-scale issues that plagued RDBMSs for 40+ years with its scale-out architecture. However, they are starting to fail just like Hadoop. They give up SQL and ACID in the process of scale out. That’s like throwing the baby out with the bathwater and not something customers want.
  • It’s undeniable that big data will continue to grow and grow. That’s a challenge and an opportunity for businesses. It’s challenging in that there’s a cost to capture, store, and manage increasingly large volumes of data. So, some organizations delete or simply ignore data from, say, manufacturing equipment, due to the costs. That’s understandable, but the old adage rings true in that businesses need to spend money to make money. And, even more importantly, traditional enterprises may save money by not investing in their big data analytical initiatives, but they risk losing market share and facing eventual extinction by well-funded data unicorns. You only need to look at Uber as an example of a data-hungry disruptor that may completely revamp the transportation industry as we know it today. So, the concern for me is that organizations that don’t make the investment in data analytical platforms that can analyze data, where it may reside at massive scale, may be missing out in the opportunity of a lifetime in using data as a differentiator.
  • The biggest concern is related to the stalled projects caused by organizations thinking they can do it all themselves. Data initiatives that become stuck remain stuck because organizations think success simply isn’t possible. Meanwhile, the market continues to evolve with a greater amount of automation available now than there was even a year ago. Tools are available that can help these companies succeed without requiring an army of engineering experts. They simply need to be educated that what wasn’t possible a year ago may be possible now because more of the data engineering processes have been automated.

Here’s who we spoke to:

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,data cleaning ,data quality ,data security

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}