The sanitization module removes noise characters, extra spaces, control characters and converts lower case letters to uppercase. Examples of noise characters are: @, #. $, %, ^, &, *, (, ), }, {, [, ]. The following characters are handled separately and have special meanings: commas, hyphens and quotes. Commas usually indicate the insertion of a last name. Sanitization places words followed by commas at the end of the string. Quotes are deleted and the space between them is removed. A space replaces the hyphens.
Before Sanitization After Sanitization
Scott Lions SCOTT LIONS
Smith, John F. JOHN F SMITH
Rose Stone-Shield ROSE STONE SHIELD
James O'Tool JAMES O TOOL
Owen, Tool, James JAMES OWEN TOOL
# Williams, $Richard RICHARD WILLIAMS
The sanitization module also contains a small rulebase. The rulebase is applied after all the alpha characters have been converted to upper case letters and extra blanks are removed. This rulebase is used to recognize words that contain noise characters or prefixes that could be effected by the sanitization process. The sanitization rulebase also makes it possible to convert non-alpha-numeric characters to other symbols or words. A new rule type was introduced in version 2.6 of the NameSearch product. The First Word rule type was designed for commercial name searches where a word in the first position of a name would be considered noise. There are instances when a word in the middle of a commercial or corporate name would help contribute to the identification of a record, but the same word found in the first position would obscure the search. Classifying noise words based on position could effect NameSearch’s ability to overcome sequence variations. The application of this rule should be used judiciously and with great thought. The sanitization rulebase can be easily modified using the NameSearch Graphical User Interface, the “Generation Shell.”
Examples of sanitization where the sanitization rulebase is used:
Before Sanitization after Sanitization Sanitization without the rulebase
c\o CARE OF C O
Mc Donald, Old OLD MCDONALD MC OLD DONALD
% CARE OF