Identity Resolution

What is Identity Resolution?

Identity Resolution is the process of correctly determining the identity of some real entity based upon verified data links that serve as a real, but often changing, set of clues. Identity Resolution is used in a variety of applications – e.g., linking census records, spam detection, or uniquely identifying a node on a large computer network. In the business world, one of the most common identity resolution problems has to do with consumer recognition. There are two aspects to this:
• Looking at records across multiple channels to ensure that individuals accurately linked together and represented accurately
• Looking at various records in a single CRM database and trying to ensure that each record accurately represents one individual (a unique reference)

Let’s take a very simple example that gives a sense of the complexity of these issues. We'll use a made up company named "MyCompany" for this example.

MyCompany makes clothing for both adults and teenagers. The product sets are very different and MyCompany sends customized catalogs to customers based on whether they are in one group or the other. If MyCompany has no information on the age of the customer, they send the catalog for adults by default.

Shown below are two records, with eleven identity attributes each, in the direct mail CRM database in the catalog division of MyCompany, Inc. Each of these records is a representation of (or reference to) an entity in the real world. MyCompany wants to determine if these two entity representations refer to the same person because it costs ~$15 for them to print and ship a single catalogue and they don’t want to waste money sending duplicates to the same person or sending the catalog for a person in one segment to a person in the other.

First	Middle	Last	Suffix	Age	Street Number	Directional	Street Name	City	State	Zip
John	Alan	Smith		45	123	S.	Main Street	Sometown	OH	34213
John	A	Smith	Jr		123		Main Street	Sometown	OH	34213

You might say "Obviously they are the same person." MyCompany did. So MyCompany sent one catalogue for an adult to the address.

But it isn’t so obvious. Let’s, for a moment assume, that the addresses are actually the same. In reality, it turns out that John A Smith Jr. is the son of John Alan Smith and they happen to live together. In this case the correct records should be:

First	Middle	Last	Suffix	Age	Street Number	Directional	Street Name	City	State	Zip
John	Alan	Smith		45	123	S.	Main Street	Sometown	OH	34213
John	Alan	Smith	Jr	15	123	S.	Main Street	Sometown	OH	34213

MyCompany should have, in reality, sent two catalogs, one targeted to the adult segment and the other targeted to the teen segment. To accurately represent this in their database, they should break apart their single record into two separate records, one for each customer. On the other hand let’s assume the addresses in reality are not the same. Instead we find the only error is the zip code:

First	Middle	Last	Suffix	Age	Street Number	Directional	Street Name	City	State	Zip
John	Alan	Smith		45	123	S.	Main Street	Sometown	OH	34213
John	Aaron	Smith	Jr		123		Main Street	Sometown	OH	34214

n this case, MyCompany, until recently, missed a customer because they assumed the two were the same person at the same address.

The key thing to note here is how one change in an attribute in a simple example can completely change the conclusion about the identity of an entity and substantially impact business decisions. When you add in name changes and address changes over a person’s life, this ability to determine equivalence between references quickly becomes challenging. When we go across databases, the situation becomes even more complex.

LiveRamp's Approach in the AbiliTec API

LiveRamp uses the link approach to identity resolution in our AbiliTec API. AbiliTec uses all the documents regarding an entity, as they are available and collected over time. In many cases up to over 40 years of historical data is available for resolution purposes. AbiliTec is thus capable of resolving references within a single data store with a very high degree of confidence. Not only that, AbiliTec is uniquely able to recognize individuals across multiple addresses and across multiple channels - direct mail, email, online, mobile, and phone. This is due to the number of data sources that have been available to AbiliTec over the years.

AbiliTec has three kinds of links. Consumer links represent an entity that is a consumer. Address links represent a site or physical location. Household links represent a collection of individuals, over the age of 18, residing together. Address links are returned with consumer links in entity documents. As a result, the API can use these links to return information about a person, a place, a household, or an entity depending on the developer’s needs. This return information is contained in documents. There are five kinds of documents

• Person
• Place
• Household
• Entity
• Group

When an AbiliTec link is provided (through the AbiliTec API lookup endpoint), each type of document returns the appropriate kind of information in a structured format.

Determining Equivalence

Determining whether two or more reference instances are equivalent is called matching. Linking and matching are often confused. Linking two references is to assign a common identifier called a link value to indicate that the references are equivalent. Matching two references means applying an algorithm that measures the degree of similarity between some set of attribute values. If the degree of similarity reaches a pre-defined threshold, the two references are said to match.

Common matching techniques currently used in the marketplace include direct matching, probabilistic matching, approximate string matching, and other typical matching approaches such as edit distance routines to provide matches across names and addresses present within a client’s data. AbiliTec combines these techniques with its own proprietary algorithms to achieve higher confidence in matching. Once records are matched, they are assigned a persistent AbiliTec Link.

While AbiliTec uses forms of approximate string matching to make matches, requests submitted via the lookup endpoint to the AbiliTec knowledgebase in the AbiliTec API using

• an Entity Representation (some combination of name, postal address, phone number, or email address) or
• a Hashed Entity Representation (which is an SHA1 hash of an entity representation)

will only return a document if the input ER has an exact match against an index key for a document.

The Match Endpoints , on the other hand, does provide a best match based on approximate string matching.