Over a million developers have joined DZone.

How Big Data Causes Big Problems for the WAN

DZone's Guide to

How Big Data Causes Big Problems for the WAN

Generally, the more data you collect, the better. But when you need to deal with WAN-connected sites, lots of new data is moving over the same old networks, which slows things down exponentially.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Who doesn't love data these days? For business leaders, the more data you can collect, the better. Some might resist certain technologies in digital transformation, but it seems everyone can agree that big data analytics makes business life easier.

Everyone except IT leaders, that is. While business leaders add ever-increasing data inputs with IoT sensors, IT teams are left to find new ways of supporting complicated big data infrastructures without much guidance or extra budget. That, coupled with increasing WAN-connected sites, means a lot of new data is moving over the same old networks.

To say that big data is stressing your WAN would be an understatement. Networking for big data is a new consideration for a lot of IT teams. However, it's not enough to just implement more WAN optimization tools to solve the problem.

Growing big data efforts are causing big problems. The answer isn't a simple appliance-it's an overhaul of your approach to WAN management and network monitoring in general.

The Problem: Exponential Growth of East-West Traffic

You typical WAN optimization solutions were built with north-south network traffic in mind. Remote and branch-office employees needed seamless application performance with direct, uninterrupted access to central servers.

This north-south optimization is still important when it comes to networking for big data, as end users rely on applications to derive insights. But when it comes to your WAN performance, the real problems are rooted in the east-west network traffic that supports big data collection and analysis.

Unlike north-south traffic that can accept a degree of latency, the east-west big data traffic must transport at high network speeds. The value of big data lies in real-time processing and data delivery. If you don't have real-time access to data, the insights you derive will be outdated by the time you can make use of them. Real-time data delivery will become increasingly important to enable agile decision making and automation, especially as the Internet of Things takes hold in your business.

The machine-to-machine communication required for big data will impact your WAN planning on multiple fronts:

  • Bandwidth management: Regardless of the source, collecting ever-increasing amounts of data eats up network bandwidth. If bandwidth is strained and you can't pinpoint the precise reason, end-user experience will suffer and big data's value will disappear.
  • Data replication: All the big data in the world won't help you if disaster strikes and the data is unprotected. Data replication is critical for analytics effectiveness. But at the same time, the more resources you devote to big data replication, the more bandwidth you consume. Is your WAN capable of handling both big data collection and replication?
  • Data accuracy: Data corruption isn't a new challenge for IT leaders. However, it may have been easier to identify in the past when there weren't so many servers supporting east-west traffic. Big data will exacerbate the problem and if WAN managers can't identify corrupt data (and its source), the business will suffer.

The challenge that underlies each of these management pain points is that application usage must be tied to big data demands. A simple WAN optimization tool won't cut it-a greater monitoring mindset is essential.

Networking for Big Data: The Value of Visibility

One of the best things WAN managers can do as big data stresses their networks is to double-down on QoS enforcement. You want complete control over traffic prioritization to ensure end users don't feel the pain of any potential big data-related problems on the back end.

But even more important is the ability to pinpoint which big data processes and applications are causing bandwidth and integrity problems. No network is perfect, so it would be unreasonable to expect big data to never cause a network problem. However, you can still take the steps necessary to guarantee visibility throughout your application portfolio.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

big data ,data analytics ,optimization

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}