How Do I Get More Insights From Metadata?
AI can be used to find the needles in a haystack, and the developer community is in a cool position help future analysts using artificial intelligence.
Join the DZone community and get the full member experience.Join For Free
It was great talking to Aaron Kalb, Co-Founder and Head of Product at Alation — an enterprise collaborative data platform that’s using artificial intelligence (AI) and machine learning (ML) to find information of value and insights in metadata.
Q: What are the keys to getting information of value from metadata?
A: Start by getting a handle on all of your data assets. A lot of companies don’t realize where all of their data resides. As storage costs decline, data noise proliferates and it gets harder to find the signal. Good metadata makes opportunities to empower employees to find insights using natural language to make more informed business decisions. Help your employees focus on higher level tasks versus things a robot could do.
Q: What are the most significant changes as metadata expands and evolves?
A: Companies are beginning to realize how much latent value their data has and have expanded their view of what metadata is to include the context of the data — when and where it was gathered and how people have used it in the past. We help clients look at how the data was queried and joined. This context provides insights into the future opportunities of what can be done with the data.
Q: What are the technical solutions you use to analyze metadata?
A: Success begins with defining the user problem and then determining what combination of natural language processing, machine learning, inferential AI, or even crowdsourcing will yield the desired outcome. It’s a design-before-development process.
Q: What are some real-world problems you are helping your clients solve?
A: How to allocate human capital across a massive e-commerce platform with ten million columns of data. In a survey we conducted, the results were that hundreds of tables of data were considered similarly important and there was a single person who was seen as the resident expert. We came in and performed a deep parsing of query logs to see how frequently the data was used and by whom. We identified the 90 percent of the tables used by zero or one person, the nine percent that were used by two to ten people, and the one percent used by 11 or more people, with the top handful used many times more than the runners-up. We also identified distinct experts for each of the data sets. This knowledge enabled new employees working with data to get their bearings in a week versus six months.
Q: What are the most common hurdles you have to help clients get over with regards to their metadata?
A: Risk mitigation in highly regulated industries, like financial services and healthcare, where they have PII and PHI data that cannot be exposed. And revenue optimization everywhere — we help provide a “bird’s eye” view of the data and then empower data analysts and scientists to find the data they need via natural language search while protecting the confidential portions of the data.
Q: What’s the future of using AI on metadata?
A: We will have systems that learn. In highly regulated industries like financial services and healthcare, learning will take place within the company. In other industries, companies may benefit from each other’s metadata. Computers are already able to make educated guesses about whether a series of numbers are phone numbers, social security numbers, or SKUs and get better with training data and human feedback. We will be perfecting human and computer interaction to learn faster, lining up decisions for humans that will maximally inform the algorithms and have the broadest impacts.
Q: What are your concerns about AI being used with metadata?
A: One concern is blindly picking a technique instead of using the right combination for the problem at hand. Augmented intelligence is often superior to AI. For example, when diagnosing tumors, AI and humans working independently can be wrong. When AI and humans work together, they tend to produce very accurate results. Another concern is training algorithms and interpreting results over bad metadata. For example, an algorithm trained to impartially predict whether a parolee will re-offend based on historical data wound up displaying racist tendencies. That’s because its training data was mislabeled as to whether or not the individual “committed a crime” when what was actually being measured was whether the individual was arrested and convicted. There are known systemic biases in both arrest and conviction rates, so the algorithm was perpetuating the flawed system.
Q: What skills do developers need to use AI to add value to metadata?
A: Use DZone and Stack Overflow, but also understand some of the theory and computer science. Learn the strengths and weaknesses of the tools and the types of problems they are most suited to handle. Instead of getting lost in the different formats of metadata and different AI and ML algorithms, think about what the end user needs and work backward.
Q: What else do we need to consider with regards to metadata?
A: The developer community is in a cool position help future analysts. Be kind to your future self and future colleagues; label and document so others can follow what you’ve done and not pull you off your next project to explain what you did on this one.
Opinions expressed by DZone contributors are their own.