| |
Sanitization
The sanitization module removes noise characters, extra
spaces, control characters and converts lower case letters to uppercase. Examples
of noise characters
are: @, #. $, %, ^, &, *, (, ), }, {, [, ]. The Following characters are
handled separately and have special meanings: commas, hyphens and
quotes. Commas usually indicate the insertion of a last name. Sanitization
places
words followed by commas at the end of the string. Quotes are deleted
and the space between them is removed. A space replaces the hyphens. Examples of Sanitization:
| Before
Sanitization |
After
Sanitization |
| Scott
Lions |
SCOTT
LIONS |
| Smith,
John F. |
JOHN F SMITH |
| Rose Stone-Shield |
ROSE STONE SHIELD |
| James O'Tool |
JAMES OTOOL |
| Jim O. Tool |
JAMES OTOOL |
| Owen, Tool,
James |
JAMES OWEN TOOL |
| # Williams
, $Richard |
RICHARD WILLIAMS |
The sanitization module also contains a small rulebase.
The rulebase is applied after all the alpha characters have been
converted to
upper case letters and extra blanks are removed. This rulebase
is used to recognize words that contain noise characters or prefixes
that could be
effected by the sanitization process. The sanitization rulebase
also gives you the ability to convert non-alpha-numeric characters
to other symbols
or words. The First Word rule type was designed for commercial
name searches where a word in the first position of a name would
be considered noise.
There are times when a word in the middle of a commercial or cooperate
name would help contribute to the identification of a record but
the same word
found in the first position would obscure the search. Classifying
noise words based on position could effect NameSearch’s ability to overcome
sequence variations. The application of this rule should be used judiciously
and with great thought. The sanitization rulebase can be easily modified
using the NameSearch Graphical User Interface, the "Generation Shell."
| Before
Sanitization |
After
Sanitization |
Sanitization
(without rulebase expertise) |
| c\o |
CARE OF |
C O |
| Mc Donald,
Old |
OLD MCDONALD |
MC OLD DONALD |
| % |
CARE OF |
|
NameSearch® General Information
|
|