NameSearch Probabilistic and Deterministic Matching
NameSearch® is extremely versatile and supports both deterministic
and probabilistic matching.
Probabilistic matching
is achieved through the derivation of a weighting scheme
that considers the frequency of identification information to formulate
a score and/or ranking. In principal the goal is to apply lighter
weights to common elements and use the uncommon information
to differentiate
records.
As an example the record John Smith at 15 Vorris Dr., has a
common name element, “John Smith” and an uncommon street
address. Proponents of this theory would state that by applying
greater weight to the uncommon
street address a more precise match is realized.
NameSearch® comes with a function called predict. Predict is used
to calculate the number of records that would be returned from a query
without accessing the underlying database. To use the predict function
you first must have a representative sample of names and/or addresses.
The sample will be used as input into the predict frequency analysis tool
that is part of the NameSearch® graphical user interface. The predict
frequency routine creates a special table that is used by the predict function.
By utilizing predict with the advanced comparison routines NameSearch® provides
its clients with the capability to perform probabilistic matching.
The opposition to this methodology espouses extremely poignant criticisms.
The lack of a unifying scoring scheme produces seemingly unpredictable
results. Even though a specific algorithm is used for determining scores
users get frustrated when they observe the application of varied match
criteria. On-line applications utilizing this methodology are often abandoned
due to the seemingly unpredictable nature of the results. Through empirical
study it has been shown that the differentiation of records by placing
greater weight on common elements is flawed. Uncommon elements have a greater
degree of variation. By utilizing the field that is most prone to errors
many good matches are missed.
Deterministic matching is significantly easer to implement and maintain.
The probabilistic approach requires extensive frequency analysis to be
performed in order to understand the frequency characteristic of the data
population. To insure accuracy, volatile and fluid data populations require
constant maintenance. Deterministic matching yields highly uniform results
and has been shown to be more tolerant of variations and have fewer false
positives. False positive are those situations where two records are incorrectly
matched.
NameSearch® supports the probabilistic model however
deterministic matching is known to yield significantly better and more
reliable results.
NameSearch® General Information
|