Sanitization

 

The sanitization module removes noise characters, extra spaces, control characters and converts lower case letters to uppercase.   Examples of noise characters are:  @, #. $, %, ^, &, *, (, ), }, {, [, ].   The following characters are handled separately and have special meanings: commas, hyphens and quotes.  Commas usually indicate the insertion of a last name.  Sanitization places words followed by commas at the end of the string.  Quotes are deleted and the space between them is removed.  A space replaces the hyphens.

 

 

 

 

 

 

 

 

 

Examples of Sanitization:

 

Before Sanitization                                               After Sanitization

Scott       Lions                                                      SCOTT LIONS

Smith,   John F.                                                     JOHN F SMITH

Rose Stone-Shield                                               ROSE STONE SHIELD

James O'Tool                                                        JAMES O TOOL

Owen, Tool, James                                               JAMES OWEN TOOL

# Williams,  $Richard                                           RICHARD WILLIAMS

 

The sanitization module also contains a small rulebase.  The rulebase is applied after all the alpha characters have been converted to upper case letters and extra blanks are removed.  This rulebase is used to recognize words that contain noise characters or prefixes that could be effected by the sanitization process.  The sanitization rulebase also makes it possible to convert non-alpha-numeric characters to other symbols or words.  A new rule type was introduced in version 2.6 of the NameSearch product.  The First Word rule type was designed for commercial name searches where a word in the first position of a name would be considered noise.  There are instances when a word in the middle of a commercial or corporate name would help contribute to the identification of a record, but the same word found in the first position would obscure the search. Classifying noise words based on position could effect NameSearch’s ability to overcome sequence variations.  The application of this rule should be used judiciously and with great thought.   The sanitization rulebase can be easily modified using the NameSearch Graphical User Interface, the “Generation Shell.”

  

Examples of sanitization where the sanitization rulebase is used:

 

Before Sanitization               after Sanitization Sanitization without the rulebase

 

c\o                                          CARE OF                               C O

Mc Donald, Old                    OLD MCDONALD              MC OLD DONALD

%                                            CARE OF