NameSearch® (White Paper)
Intelligent Search Technology's NameSearch® product
is a tool which will enable you to find records using names (personal
and corporate), address and/or other identifying information. This
sophisticated software will increase the quality of searches while minimizing
I/O expense.
The first aspect of Name Search® is intelligent
key and range building. This facility is used for the retrieval
of records regardless of variation
caused by phonetics, transcription or keyboarding errors, nicknames,
short forms, missing words, extra words, noise and sequence variations.
Names and addresses suffer from a skewed distribution. A few words represent
the majority of names, while large volumes of uncommon names exist but occur
infrequently. This is most dramatically illustrated by the analysis of people's
names in the United States. While there are 2.5 million last names and over
3.2 million first names, three hundred surnames represent thirty-five percent
of the population, while over sixty-five percent of the population has one
of four hundred first names. The skew and distribution of company names
and street addresses are just as extreme. Inquiries will usually possess
a similar distribution pattern as the name population in the database. Complicating
the problems of skew and distribution are the variations due to name frequency
characteristics in different geographical locations and the type of information
stored in the database.
Traditional solutions for solving name variations only
deal with phonetic errors. These solutions involved the standardization
of easily confused
sounds. For example, PH's would be treated as F's. Elaborate linguistic
rules were generated to phonetically tokenize a name. These phonetically
tokenized words served as the basis for name retrieval. In some
instances these rules helped find names which were hard to spell,
unfortunately, the
distribution pattern of common names became even more skewed. For
example, inquiries on John also returned Joan, Jim, Jane, Jimmy,
Jenn and other names
which fell in the "JAN" phonetic pattern. By aggravating the skew
in distribution of names both quality and performance were sacrificed.
Discrepancies caused by phonetic errors account for
twenty to twenty five percent of all name variations. Intelligent
Search Technology addresses
problems due to phonetics by employing analysis routines to determine
when phonetic tokenization should be applied. This enables NameSearch® to
overcome problems due to phonetics without the negative consequences
incurred with all other methods of name search.
Many name variations are caused by the use of nicknames.
Names like Bill, William, Bob and Robert are used interchangeably
to identify individuals.
NameSearch® uses rule based expertise to solve this class of problems.
The NameSearch® rule base is also used to identify
noise words. Noise words are elements in a name which do not
help in the identification of
a candidate. Examples of noise words are Incorporated, Corporation,
Limited, Junior, Senior, Avenue and Street. Often there are times
where elements
in a name contribute to the identity but should be treated as less
important. In these cases, the rule base does not treat them
as noise words but recognizes
that they are less significant. Some examples are associate, board,
international and services.
The rule base also contains rules for handling common prefixes. Names like
McDaniel are confused with MacDaniel. Prefix recognition provides the facility
for handling these classes of problems.
Another feature of the rule base is diminutive recognition.
Frequently there are names which end in a diminutive such as "ie" or "y".
In these cases, it is useful to identify the root and apply the
rule. For example, you would want Bill, Billie and Billy to find
William or Willie.
NameSearch® comes with an extensive predefined
set of rules. These rules can be used right out of the box or
modified to meet your specific
needs. This is done through the NameSearch® Generation
Shell.
The Generation Shell is a graphical user interface designed
for the modification and tuning of your NameSearch® subroutines. The
Shell allows you to adjust frequency and rule base tables, set
various parameters, modify key
building routines and test changes.
The NameSearch® software, in addition to key building,
comes with advanced comparison functions. These functions use
the strength of the key building
routines to intelligently calculate numeric values indicating the
likelihood of a match.
These comparison routines can be used for the elimination of candidates
from an on-line system providing the ability to tailor information being
displayed. This is especially useful for systems containing more than ten
million records. In addition, the comparison routines form the basis behind
batch utilities, such as merge/purge application. These comparison routines
enable systems to make decisions without human intervention.
NameSearch® integrates various strands of knowledge to form a cohesive
fabric enabling successful retrieval of records based on a name and/or addresses.
By incorporating rules on common prefixes, suffixes, nicknames, noise words
and other similar classes of variations, combined with Intelligent Search
Technology's phonetics mechanism and it's user friendly Generation Shell,
the complexities of NameSearch® are made easy.
NameSearch® General Information
|