Using Data Matching to Resolve Identity Resolution Challenges
Speed and accuracy determine the efficiency of a data matching tool to drive identity resolution management goals. Here's how.
Join the DZone community and get the full member experience.Join For Free
Consumers interact with a brand through hundreds of touchpoints across devices, platforms, and channels. During the buyer’s journey, consumers use 3-4 internet-connected devices. And by 2021, the number is expected to increase to 13 devices. This exponential increase in device usage indicates a sudden surge in data as well. This data influx is demanding organizations to have proper data cleansing strategies in place so that their organizational data is always kept up-to-date, accurate, and consistent.
Companies gather this data from various consumer touchpoints and use it to design better, personalized experiences for them. And if data is being gathered using multiple disparate systems – which nowadays, it normally is – it becomes crucial to perform identity or entity resolution.
Identity resolution: the process of relating multiple records on the basis of ‘unique identifiers’ such that all matching records represent a single user/entity.
The identity resolution process outputs a single, accurate, 360-degree view of each entity; including all their behavioral, transactional, engagement records connected together. This way, you’re able to understand the entire scope of the user altogether, rather than trying to make sense of disparate information.
Why Does Your Organization Need Identity Resolution?
Organizations usually misinterpret the real importance of entity resolution for their enterprise. It is not only about addressing a prospect/customer with their correct first name in an email. Rather, it is about taking another conscious step in getting to know your prospect better and designing personalized experiences for them. It is about identifying patterns and behaviors associated with a single user across various engagement systems and using it to maximize brand impact and lead conversion.
As mentioned in the Forrester study: Is your identity program built on a house of cards? Here are the top 5 reasons for implementing entity resolution to your databases:
- More complete profiles of your leads, prospects, and customers, that allow you to design better, personalized experiences according to their behavioral patterns and preferences.
- Better data controls and security over your organizational data, allows you to follow data compliance standards and guidelines, such as GDPR, CCPA, and HIPAA, etc.
- Opportunities to upsell and cross-sell your products and services to existing customers, and shaping the customer journey by offering relevant recommendations.
- More accurate and effective marketing measurement, such as qualified leads, lead conversion rates, return on marketing investment, and customer engagement, etc.
- Improved data analytics that gives an accurate, complete, and consistent view of brand image, perception, and experience.
How to Perform Identity Resolution?
An identity resolution process relates three types of data together about an individual:
Terrestrial information: involves a user’s personal contact information, such as name, home and work addresses, phone number, etc.
Device information: involves IP data or other information that uniquely identifies the devices that are associated with a user.
Digital information: involves email addresses, social media profiles, website visits, CTA clicks, resource downloads, etc.
The identity resolution process has the following five steps:
Step 1: Identify variables that represent an entity
It involves identifying different platforms, channels, and devices that are used by an entity during their buying journey.
Step 2: Map all user interactions
In this step, the information gathered in step 1 is related together to construct various interactions or touchpoints a user had with your brand.
Step 3: Construct the buyer’s journey via data matching
Now that you have identified all touchpoints of a user, it’s time to relate different interactions together to understand the complete buyer’s journey. This step requires you to perform data records matching of all these interactions so that you can assess which of these belong to the same entity.
In few cases, this data matching is pretty simple as there is always some information that is unique to each record, such as email or IP address. But in cases unique identifiers don’t exist, complex data matching algorithms need to be implemented to perform phonetic, numeric, or fuzzy matching.
Step 4: Validate the matched results
In this step, you need to verify that the interactions that are labeled as belonging to the same individual seem appropriate and decide for the interactions left unmatched.
Step 5: Create the golden record
Based on the matched and validated results, you can now create a master golden record that serves as the single source of truth that shows the complete journey of your leads, prospects, and customers. This becomes the driver of all your marketing and sales efforts, as it gives an accurate, correct, and consistent view of the data.
Challenges to Overcome While resolving Entities
The identity resolution process is pretty straightforward. But there are multiple challenges encountered while performing these steps. The most important challenges are listed below:
Missing, Incomplete, or Inconsistent Unique Identifiers
As explained in the above process, all user interactions are related together to construct the complete buyer’s journey. This is carried out based on the data fields that uniquely identify the entity, such as email address, device IP information, etc. But it is quite difficult to have complete and consistent unique identifiers in all your datasets coming from various engagement systems.
Here are some scenarios that need to be solved before accurate data matching can take place:
- Unique identifiers exist but are incomplete: this happens when various systems fail to grasp the uniquely identifying data fields for some user interactions due to any reason.
- Unique identifiers exist but are inconsistent: this happens when data from various systems is integrated together to complete the buyer’s journey. In this case, you have unique identifiers in each dataset, but they are not the same. Maybe one application uses an email address to identify a user, while the other application uses an IP address.
- Unique identifiers do not exist at all: in this case, you need to combine different fields together to uniquely identify an interaction. For example, the name field along with contact phone or mailing address may give uniqueness to a user interaction record.
Unclean and Unstandardized Data
Poor data quality is another common issue associated with entity resolution. For your records to be comparable and resolvable to form entities, you need clean and standardized data. This requires you to make sure your data records contain information that is accurate, complete, consistent, unique, valid, and up-to-date. If your data records do not measure up to these six critical dimensions of data quality, then expect your resolved entities to have very low accuracy levels.
When we consider resolving entities, it means comparing data records to assess which records belong to the same individual. In this process, every data record must be compared with every other record in the same dataset. And as most organizations use multiple data applications that track user interactions, a single record is also compared with all records present across multiple datasets.
It is expected that the computational complexity of these comparisons grows quadratically as the size of the database grows. This indicates that your identity resolution process must be carried out using a data system that can withstand such complex computational power.
Tuning Records Matching Algorithms to Maximize Accuracy
As data matching algorithms must be tuned to achieve maximum accuracy on a given dataset, it is an overwhelming challenge to ensure the least number of false positives and negatives are being delivered with your tuned variables.
One of the key struggles with entity resolution is the amount of effort that goes into manually reviewing each record classified incorrectly or left unmatched. Traditional data matching methods that rely solely on deterministic algorithms do little to relieve businesses from this dilemma.
Moreover, they don’t allow for easy fine-tuning, making it difficult for the user to truly get optimized results.
Using a Self-Service Data Cleansing and Matching Engine for Identity Resolution
We reviewed the entire identity resolution process as well as the challenges that are usually encountered during its implementation. Multiple solutions and systems can be used to overcome these challenges, but the smart decision is to integrate an automated, self-service tool that performs various steps of data profiling, cleansing, matching, deduplication, and data merge, all together in a single platform.
Published at DZone with permission of Zara Ziad. See the original article here.
Opinions expressed by DZone contributors are their own.