NameSearch Search Key Building Product Information
Search keys are built after sanitization, word
recognition and phonetic tokenization. Every database record
must contain or be indexed by at least one NameSearch key. A key
loading utility must
be written to populate an index or database. The utility will
sequentially read records, pass the names to the NameSearch key
building function
and store the returned keys.
Many search problems are caused by sequence variations. The
inability to determine the order of words for a particular entity
occurs at both data entry and inquiry time. The name Frank Lee
for example, could have been Lee Frank. This problem is particularly
pervasive in company names. Names such as International Business
Machines, Anderson Consulting and Kemper Insurance Company are
examples where the left-most word is most significant. Conversely,
Edward S. Gordan Real Estate Company and Paul Mitchell hair products
are examples where the left-most word is less significant. The
inability to predict the significant name with respect to word
position causes many searches to fail.
Merging foreign database files causes other sequence variations.
This frequently occurs when external lists are purchased or companies
consolidate information. Inconsistent methodologies for data
capture make the standardization of name fields impossible. Aggravating
the sequence problem are those instances in which company names
are intermixed with personal names. All of these factors, in
addition to human error, contribute to identification problems
caused by sequence variations. NameSearch provides a facility
for handling these problems. A set of permuted keys is returned
after the call to the key building function. In order to solve
search problems caused by sequence variation the permuted keys
will be used to index your database.
To understand how these keys are used we will draw an
analogy between a telephone book and a database system. When
we look for Frank Lee we search the "L" section. If the name
is not there, we continue the search by looking in the "F" section.
In order to find Frank Lee we had to search two separate sections
of the phone book. Suppose we were looking for Frank Lee Ray.
To ensure success we must search all the permutations. This is
an extremely arduous and time consuming process for both people
and computers.
By listing Frank Lee in both the L and F sections, regardless
of order, only one section would need to be searched. The one
disadvantage of storing multiple listings is the expense of storage. NameSearch® General Information
|