The
sanitization module removes noise characters, extra spaces, control
characters and converts lower case letters to uppercase. Examples of
noise characters are: @, #. $, %, ^, &, *, (, ), }, {, [, ].
The Following characters are handled separately and have special
meanings:
commas, hyphens and quotes. Commas usually indicate the insertion
of a last name. Sanitization places words followed by commas at the
end
of the string. Quotes are deleted and the space between them is removed.
A space replaces the hyphens.
Examples of Sanitization:
| Before Sanitization |
After
Sanitization |
| Scott Lions |
SCOTT LIONS |
| Smith, John F. |
JOHN F SMITH |
| Rose Stone-Shield |
ROSE STONE SHIELD |
| James O'Tool |
JAMES OTOOL |
| Jim O. Tool |
JAMES OTOOL |
| Owen, Tool, James |
JAMES OWEN TOOL |
| # Williams , $Richard |
RICHARD WILLIAMS |
The sanitization
module also contains a small rulebase. The rulebase is applied
after all the alpha
characters have been converted to upper
case letters and extra blanks are removed. This rulebase is used
to recognize words that contain noise characters or prefixes that
could be effected
by the sanitization process. The sanitization rulebase also gives
you the ability to convert non-alpha-numeric characters to other
symbols
or words. The First Word rule type was designed for commercial name
searches where a word in the first position of a name would be considered
noise.
There are times when a word in the middle of a commercial or cooperate
name would help contribute to the identification of a record but
the same word found in the first position would obscure the search.
Classifying
noise words based on position could effect NameSearch’s ability
to overcome sequence variations. The application of this rule should
be used judiciously and with great thought. The sanitization rulebase
can be easily modified using the NameSearch Graphical User Interface,
the "Generation Shell."
| Before Sanitization |
After Sanitization |
Sanitization (without rulebase expertise) |
| c\o |
CARE OF |
C O |
| Mc Donald, Old |
OLD MCDONALD |
MC OLD DONALD |
| % |
CARE OF |
|
NameSearch® General
Information