Enterprise Data Management: Stick to the Basics
The basics of enterprise data management, or EDM, are outlined here for those looking to better manage their company's metadata analytics.
Join the DZone community and get the full member experience.Join For Free
Lots of people have increasing volumes of data and are trying to run data management programs to better sort it. Interestingly, people's problems are pretty much the same throughout different sectors of any industry, and data management helps them configure solutions.
The fundamentals of enterprise data management (EDM), which one uses to tackle these kinds of initiatives, are the same whether one is in the health sector, a telco travel company, or a government agency, and more! Therefore, the fundamental practices that one needs to follow to manage data are similar from one industry to another.
For example, suppose you're about to set off and design a program. In this case, it may be your integration platform project or your big warehouse project; however, the principles for designing that program of work is pretty much the same regardless of the actual details of the project.
What Is EDM?
So, what is EDM? It's where you're doing targeted uplift in data management and you're doing it as part of an extensive program of work. These things have got lots of different names. They may not be called enterprise data management; they might be the enterprise data lake project, the big integration project, the Big Data project. They'll have different names from one organization to another, but they're big; they're intended at an enterprise level to improve what you're doing in terms of data management.
Data Management Capabilities
Several data management capabilities fall under fundamental EDM capabilities. The data governance, data quality metadata, and referencing master data are quite critical indicators. If these are present, and if one is focused on them and doing them well, they're often a good indicator of the project's long-term success.
If they're a bit underdeveloped, then the program of work is likely to run into difficulties. Now, bear in mind, we're talking about capabilities, policy people's knowledge, and automation and tools. We're not talking just about tools; it is capabilities.
Some capabilities are a little bit more critical than others. And if those spine capabilities are underdeveloped, you could face severe difficulties in moving a program forward.
Of the spine capabilities, I believe that the metadata and the data governance are where you should focus your energies and efforts; there is a necessary dependency here. If you want to then move on to data quality reference and master data, you need to have good governance and metadata.
Once your data management program is underway and you're starting to do capability uplifts, people can do metadata data governance. You can find that it's then much more comfortable with nurturing and growing some of those other things like data modeling, data design and data architecture, and so on.
Don't Suck All the Data
One of the biggest mistakes we see in lots of data management projects is people assuming the best course of action is to suck all of the data, put it all in one place, and then fix it all up. You shouldn't bring data together until you've got a good reason to do so; and usually, that good reason to do so is identified by an analysis of the metadata in your data catalog.
If you don't have that, you can't come up with a sensible reason to bring the data sets together. And as you do bring datasets together without the analysis backing this decision, you'll almost certainly produce nonsense. We have to put the cart before the horse. Unfortunately, it's a widespread mistake. Some of these capabilities are fundamental. If you don't nail them early in the project, your project is doomed from the start for failure.
Have a Clear Statement of Scope
Another indicator, a clear statement of scope, might sound a bit obvious, but it's often overlooked or underdeveloped. Before you run into difficulties later on in the project, have a clear definition of what types of data information will be in or out of your project.
Another essential aspect of scope is the boundaries of the enterprise. So right at the beginning, let's be clear about that enterprise. With very, very large enterprises, this is particularly important. And again, if you get it wrong, it can come back to bite you.
Then, the next part is prioritizing. Again, if you're a large organization, you've got so many different kinds of data, you want to bring it all into scope. If your organization is very big, you are going to want to be all-inclusive. However, you can't do everything all at once. In terms of prioritization, you must know what you are going to do first.
Capability Development Roadmap
Most organizations must be doing data management of some kind over the years. They've got legacies of mid-size mini-computers, and they've got all sorts of stuff, so to speak.
Ask yourself: what's your current state of maturity? You can identify things that are done well, leave them alone for a while, and focus on other areas that are done poorly. Having the ability to come up with an assessment quickly of the current state of maturity and then, in terms of where you want to be, have a target stage of maturity is an important thing to do.
Governance is all about clear, unambiguous, quickly understood, and communicated accountabilities and responsibilities. To do enterprise data governance, you need to know: what data do we have? Where did it come from? How did it get here? Who can access it? What can we do with it? Is it any good? All of that stuff is the bread and butter of metadata. And that metadata needs to be captured right at the beginning of your program work and must remain organized and managed in your enterprise information catalog.
Data literacy is critically vital through all tiers of management; it's not just an IT thing. It's not only developers that must be able to comprehend data; everybody needs to have at least a basic understanding. It's the business language. If you don't speak data, you can't run your business correctly.
Data stewards are people on the business side of the house. They might be attached to a program or a project, but regardless, they operate along a line of business. These are people who know data management well. They're the advocates and champions of doing data management right. They need to be embedded in the business because it is their business.
So stewardship is making stuff happen in the business side of the house. When you're unsure of where to go in terms of data management, talk to your steward. If it's too complicated for even them to help you, they are able to refer it on to your specialist, which keeps the process simple. The sort of thing that stewards need to understand is that the connection between business processes and data are just two sides of the same coin. Business processes both produce and consume data.
Data Domain Definitions
Data Domain Definitions are a fundamental part of managing structured data. Everyone in the organization who uses and manages data needs to know that these exist and where to go to find them.
Data domains are crucial for data quality. Data definitions play a pivotal role in ensuring good data quality. Without defined Data Domain definitions, and if they're not managed in a repository, data quality practice will not move along very well.
In terms of metrics, there are two different kinds, in terms of the data itself and its quality. What metrics do you use? How do you know that your data is better? What metric are you using from before and after to measure that your data is improving? How do you know your programs are working? Well, that goes back to that capability roadmap. How do you measure capability? Have you gone from red to yellow to green? These are the questions you must ask. If we can't find a detailed description of the metrics, the key performance indicators, and how you're going to measure stuff, that's should be of concern.
Heterogeneous Information Processing
Most organizations have a legacy of lots of different stuff from different manufacturers, collected and acquired over a long time. So, you might still be running a mainframe, you've bought stuff from Oracle, you bought things from SAP, you've got Microsoft, you've got Hadoop, you've got stuff everywhere. That's what I call heterogeneous environments, and most large organizations are. Now, this is important because it goes back to figuring out where the boundary of the enterprise is. If the enterprise is your corporate functions only, it might look a bit homogenous — it might all be SAP. But suppose your enterprise's boundary includes your supply chain, your manufacturing and distribution, your marketing, etc. In that case, if the boundary is much bigger, as it boundary extends or is looking a bit more heterogeneous, things may go wrong.
Once you start building up your capabilities, say around metadata management, you need some automation and some tools support. You can't manage metadata across a massive organization without some tools support. So, where will you get your tools from? I'd suggest you go to someone who specializes in managing metadata in a heterogeneous environment.
Sometimes things like enterprise data management, metadata, and governance are all a bit woolly and abstract. You've got to be able to communicate with people and show them: here's some business value.
With large programs, you've got to be able to find silver threads right at the beginning quickly, especially when you want a success story. Everything relies on the success of the silver threads.
Opinions expressed by DZone contributors are their own.