Comparison Algorithms
NameSearch® comparison algorithms determine numeric values
that represent the likelihood of a match. Two entities are passed to the
comparison routine and a score or array of scores are returned.
There are many ways of calculating a score. William Jefferson Clinton versus
Bill Clinton would yield as scores; 100, 50, 66 and 33. The basic approach
is to divide the number of tokens by the number of matches and multiply
that figure by 100. In the previous example the number of tokens is either
two or three words. Bill Clinton has two tokens and William Jefferson Clinton
has three. The only word that matches exactly is Clinton. Dividing the number
of tokens by the number of matches and multiplying by 100 we get (1/2 *
100 = 50) and (1/3 * 100 = 33). Alternatively, Bill and William are used
interchangeably and can be considered to match. In this instance the scores
would be (2/2 *100 = 100) and (2/3 * 100 = 66).
The score or scores that are used to determine the likelihood
of a match, depend on the degree of accuracy required by your system.
Scoring is essential
in on-line applications, where the number of records returned is
too great for a person to scan or batch utilities, where decisions
based on the likelihood
of a match invoke automated processes.
NameSearch Comparison Algorithms
ALFACOMP
This is used to compare fields containing multi
word strings. The ALFACOMP routine is based solely on a heuristic
algorithm and is not
dependent on rulebase expertise.
DATESCR
The Date Score is used for comparison of two dates. The
DATESCR comparison routine uses rulebase expertise in order to
arrive at its’ results. For example, July 28, 1965 compared to 7/28/66
would yield a score of 100 in this matter the DATESCR routine overcomes
problems
due to inconsistency in date format. The routine also accepts several
parameters which will dictate the penalty for mis-matches based
on the year. By increasing
these settings the score can be made more tolerant. For example,
if you want all dates that correspond to July 28, 1965 + or - two
years, you would
set the year range to 2. If you wish it to be + or - five years
this would mean your year range would be set to 5. In this manner
NameSearch gives
you the ability to widen or narrow the range of dates being returned
given the month a day agrees.
NUMCOMP
This is a comparison routine used for evaluating Alpha
numeric strings. For example this routine is well suited for Social
Security number comparisons.
COMP, COMP1, COMP2
These are NameSearch’s comparison routines
used for scoring names and addresses. These routines utilize NameSearch’s
rulebase expertise and phonetic tokenization to determine scores.
Comp was the original comparison routine released with version
1 of the NameSearch
product. COMP1 was introduced in Version 2.0 of the NameSearch
product in order to provide a more representative score. In Version
2.5 of the NameSearch
product COMP2 was added. This routine uses ALFACOMP, an advanced
heuristic algorithm, to arrive at its results.
NameSearch® General Information
|