Trends in Customer Data
Trends in Customer Data
If we don’t start setting our data up like the office supply cabinet, there will be many unintended consequences as data continues to explode.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
People think all data is the same and there’s not that much difficulty working with it. To say all data is the same because someone can run a spreadsheet or run a query or they know how to manage/analyze data is preposterous. The premise that all data is the same is a huge fallacy.
Let's put data in perspective:
- 1 terabyte = 1 trillion bytes
- The library of congress has 20mm books = 10TB
- All books ever written = 400TB
- All US academic research libraries = 2 petabytes (PB)
- Since 2008, Google has been pulling in 12PB per day
- Every TV station broadcast for a year = 200PB
- In 2016, 600 exabytes of storage were created by hardware manufacturers
We collect, store, and archive data at the convenience of the system that generates it. This is highly inefficient and ineffective. We need to store and archive data the way people want to use it. We don’t think about how the data is going to be used and it ends up in a data lake that deteriorates into a data swamp.
Today’s data environment = movement and integration of the following:
- In-house applications (ERP, CRM)
- External providers (third-party data)
- Data stores
- Analytical platforms
- Data delivery
- Business partners
- External applications
The data world has changed dramatically in 20 years, yet we rarely talk about who’s using the data and if they have the skills to do so. We're seeing the following trends in customer data:
- Mapping and finding the data
- Formalizing data sharing
- Attention to customer data protection
- Establish data usage responsibilities
It’s about the quantity and diversity of the data you have. Analytics isn’t just about structural content. A radiologist analyzing MRI images has the same needs and skills as a business owner managing KPIs.
Organizations today just buy more storage if they have too much data rather than determining if the data is useful. Has the data we've been storing been touched? If not, is it worth keeping? Organizations need a data management strategy moving forward.
Finding the data is a big challenge for most organizations and their employees. Data is spread across dozens of platforms — and location awareness is based on tribal knowledge and relationships. Sharing occurs via thousands of specialized extracts and interfaces. There’s no method to identify (or reuse) existing technologies. Data sharing is a courtesy; it’s not a production responsibility.
As such, the challenge of finding the data is time-consuming and expensive. On average, one working day a week is spent by every employee looking for data. We need to create an easy and intuitive way for people to know what data we do and do not have and how to access it.
Scientists have found the capacity of the human brain to process and record information — and not economic constraints — may constitute the dominant limiting factor for the overall growth of globally stored information. As such, we need to establish a roadmap for customer data: a data card catalog, merchandising data like a product (measure, remove, and add), formalizing and centralizing data, onboarding everyone who will be accessing data — letting them know what do we have, how to access it, and how to protect it.
Dell was buying the same data from Acxiom nine times because they had no central place to determine their data needs so they just kept paying 9x rather than disrupt the status quo. This is true for most companies.
We live in an ad hoc world of data sharing. As we add more data sources we assume tribal knowledge will enable the organization to deal with it. This is as unrealistic as continuing to develop, test, and release software manually. The process must be automated to handle the scale at which data is accruing.
Data acquisition is based on the individual. Customer data is retrieved from convenient sources. Data is often repurposed and unknowingly misused. There’s no recommended or preferred “system of record.” Data can be freely copied from the company resulting in too much data, too many copies, and added vulnerabilities.
The amount of data has grown 6x in three years. People are making extra copies of data and giving it to multiple people. Data sharing needs to be a responsibility and addressed as a business requirement.
We must break the paradigm of how we’ve been doing things for the past 20 years. The paradigm will not scale with the data. Why can’t we access the data we need like we access things we don’t know the names of on Amazon? We don’t pay people to pull data; we pay people to do their job.
Identify a customer system of record. Make data like an office supply. Establish a data management team where it is a job rather than a one-off activity. We need to take data, especially customer data, more seriously.
Customer data is complex. There are dozens of systems containing customer information and the content varies by systems (meaning, value, etc.). Data is often gathered and stored with little thought regarding sensitivity, security, and access.
Most users think the data is clean and accurate and most people who know the data don’t want to know the specious quality of the data. To improve the situation, we need to formalize data usage responsibilities:
- Establish the goal of data self-service
- Define generally accepted principals for data (like Generally Accepted Accounting Principals)
- Develop customer data usage training
- Realize GDPR covers all of this — it's really common sense
Don’t call it "governance" — you'll scare people away. Refer to it as "company information policy." Just like we don’t send revenue to competitors and we don’t send salary data to employees, we need to build awareness and acceptance of data protection.
Most companies work with dozens of suppliers and service partners. Customer data is shared to support order processing and delivery. Multiple organizations have individual customer data repositories. There’s no tracking of data content movement or location. Data is sourced from the most convenient source. There is a lack of data supervision. AT&T, Morgan Stanley, Advocate Health Care, Goldman Sachs, Barclays, and Merrill Lynch have all been fined for providing the wrong data. Employees of these companies sent data they shouldn’t even have had access to.
Hence the need to formalize customer data protection:
- Stop informal data copying and trading
- Formalize and publish customer data policies
- Plan for customer consent limitations
- HIPPA was established 15 years ago; GDPR is on the doorstep
Here's what we’re seeing with more forward-thinking organizations:
- The emergence of the chief data officer who puts together a data strategy and ensures its execution
- A data distribution team that establishes standard methods across OLTP and analytics
- The growth of shadow IT
- Teams handling specialized content and processing
- Targeting organizational needs and time-to-market issues
- More focus on analytics
Self-service isn’t bad. It’s the direction we have to go — it’s inevitable for data, AI, and ML. If we don’t start setting our data up like the office supply cabinet, there will be many other unintended consequences as data continues to explode.
Opinions expressed by DZone contributors are their own.