Search keys are built after sanitization, word recognition and phonetic
tokenization. Every database record must contain or be indexed by at
least one NameSearch? key. A key loading utility must be written to
populate an index or database. The utility will sequentially read records,
pass the names to the NameSearch? key building function and store the
returned keys.
Many search problems are caused by sequence variations. The inability
to determine the order of words for a particular entity occurs at both
data entry and inquiry time. The name Frank Lee for example, could
have been Lee Frank. This problem is particularly pervasive in company
names. Names such as International Business Machines, Anderson Consulting
and Kemper Insurance Company are examples where the left-most word
is most significant. Conversely, Edward S. Gordan Real Estate Company
and Paul Mitchell hair products are examples where the left-most word
is less significant. The inability to predict the significant name
with respect to word position causes many searches to fail.
Merging foreign database files causes other sequence variations. This
frequently occurs when external lists are purchased or companies consolidate
information. Inconsistent methodologies for data capture make the standardization
of name fields impossible. Aggravating the sequence problem are those
instances in which company names are intermixed with personal names.
All of these factors, in addition to human error, contribute to identification
problems caused by sequence variations. NameSearch? provides a facility
for handling these problems. A set of permuted keys is returned after
the call to the key building function. In order to solve search problems
caused by sequence variation the permuted keys will be used to index
your database.
To understand how these
keys are used we will draw an analogy between a telephone book and
a database system. When we look for Frank Lee
we search the "L" section. If the name is not there, we continue
the search by looking in the "F" section. In order to find
Frank Lee we had to search two separate sections of the phone book.
Suppose we were looking for Frank Lee Ray. To ensure success we must
search all the permutations. This is an extremely arduous and time
consuming process for both people and computers.
By listing Frank Lee in both the L and F sections, regardless of order,
only one section would need to be searched. The one disadvantage of
storing multiple listings is the expense of storage.
Technical Product Information