NameSearch® comparison
algorithms determine numeric values that represent the likelihood of
a match.
Two entities are passed to the
comparison
routine and a score or array of scores are returned.
There are many ways of calculating a score. William Jefferson
Clinton versus Bill Clinton would yield as scores; 100, 50, 66 and
33. The
basic approach is to divide the number of tokens by the number of
matches and multiply that figure by 100. In the previous example the
number
of tokens is either two or three words. Bill Clinton has two tokens
and William Jefferson Clinton has three. The only word that matches
exactly is Clinton. Dividing the number of tokens by the number of
matches and multiplying by 100 we get (1/2 * 100 = 50) and (1/3 *
100 = 33). Alternatively, Bill and William are used interchangeably
and
can be considered to match. In this instance the scores would be
(2/2 *100 = 100) and (2/3 * 100 = 66).
The score or scores are
used to determine the likelihood of a match depend on the degree
of accuracy
required by your system. Scoring is essential
in on-line applications where the number of records returned is too
great for a person to scan or batch utilities where decisions based
on the likelihood of a match invoke automated processes.
NameSearch comparison:
ALFACOMP- This is used to compare fields containing multi word strings.
The ALFACOMP routine is based solely on a heuristic algorithm and is
not dependent on rulebase expertise.
DATESCR - The Date Score is used for comparison of two dates. The DATESCR
comparison routine uses rulebase expertise in order to arrive at its’ results.
For example, July 28, 1965 compared to 7/28/66 would yield a score
of 100 in this matter the DATESCR routine overcomes problems due to
inconsistency in date format. The routine also accepts several parameters
which will dictate the penalty for mis-matches based on the year. By
increasing these settings the score can be made more tolerant. For
example, if you want all dates that correspond to July 28, 1965 + or
- two years, you would set the year range to 2. If you wish it to be
+ or - five years this would mean your year range would be set to 5.
In this manner NameSearch gives you the ability to widen or narrow
the range of dates being returned given the month a day agrees.
NUMCOMP - NUMCOMP is a comparison routine used for evaluating Alpha
numeric strings. For example this routine is well suited for Social
Security number comparisons.
COMP, COMP1, COMP2 - These are NameSearch’s’ comparison
routines used for scoring names and addresses. These routines utilize
NameSearch’s rulebase expertise and phonetic tokenization to
determine scores. Comp was the original comparison routine released
with version 1 of the NameSearch product. COMP1 was introduced in Version
2.0 of the NameSearch product in order to provide a more representative
score. In Version 2.5 of the NameSearch product COMP2 was added. This
routine uses ALFACOMP, an advanced heuristic algorithm, to arrive at
it’s results.
NameSearch® General
Information