What Is Data Sprawl?
What Is Data Sprawl?
Data sprawl is the absolutely huge amount and variety of data created by organizations every day. And it's becoming unmanagable.
Join the DZone community and get the full member experience.Join For Free
How to Simplify Apache Kafka. Get eBook.
Imagine that you need to complete your taxes, but all your relevant papers are secreted in drawers, hidden in closets, and stuffed under couch cushions. Now imagine that you have multiple copies of the forms in these places, and some are written in Greek, while others are written in English and Spanish. How will you do your taxes, or clean your house for that matter, when this is the state of things? Unfortunately, this is a problem that is starting to plague companies across the world. This is data sprawl.
Data sprawl refers to the overwhelming amount and variety of data produced by enterprises every day. With the growing number of operating systems, data warehouses, various BYOD (Bring Your Own Device) devices, and enterprise and mobile applications, it's no wonder that the proliferation of data is becoming a problem.
The problem of data sprawl is twofold:
- Getting value from your data. One issue is that the data is spread out across many data stores, and on different devices and servers. This makes it incredibly difficult to get value from your data. How can you perform comprehensive analytics when your data may be stored across many locations, or is duplicated in locations, and is in different formats? How will you gather all this information in one place? How will you get your data into a similar format so that you can compare apples to apples?
- Security. Data sprawl also creates security concerns. BYOD proliferating in the workforce means that endpoints must be secured, even as data is leaving your network via an array of devices. But, what about your servers and data stores that are maintained by different departments? Are these systems secure? Do they all follow the same compliance requirements? Is personally identifiable information (PII) being removed when moving data from one system to another? Is the data encrypted when it's being shared across systems? These are all security concerns that are magnified by data sprawl.
Why Does Data Sprawl Happen?
Data sprawl happens for many reasons.
- Employees may bring an array of devices to work and use those devices for work purposes.
- There are vast numbers of new data sources available from many places, such as JSON files, new RDBMS sources, or streaming data from traffic sensors, health sensors, transaction logs, and activity logs.
- Your company may use varied operating systems such as Windows, Mac, Linux.
- Your data may be stored in a variety of data storage systems across your network and the cloud.
- Your data might be siloed, so that it is stored in multiple places based on department, geography, or some combination of these.
- Your data may be duplicated across numerous systems and use a range of formatting.
How Can You Manage Data Sprawl?
There are a number of tools to handle the security aspect of data sprawl. For example, there are many DLP (Data Loss Prevention) tools that help identify sensitive data in your network and ensure that it doesn't leave your network in non-secure ways. Popular vendors include Checkpoint, Forcepoint, and Symantec.
For cloud tools, there are single sign-on tools that help employees to seamlessly access cloud applications outside of the network while maintaining a secured sign-on. Popular vendors include JumpCloud, Microsoft Azure, Okta, and Onelogin. This can help control the security for BYOD devices.
But, what about how data sprawl affects the way that you do business? What tools are available to help you handle your data, get it in one place, remove duplicates, and ensure that it's secure while you move it? A powerful ETL (Extract, Transform, and Load) tool can help you bring your data together where you can analyze it. As you move the data, you can cleanse it, remove duplicates, and transform the data types so the data formatting is aligned. Popular vendors include Alooma, IBM Infosphere, Informatica, and Talend.
Published at DZone with permission of Garrett Alley , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.