Job. Four different job types are supported:
A Basic Profiling Job allows you to determine various data quality indicators for individual columns or for an entire table. Data quality indicators are typically numeric values calculated for some subset of the table records or for the whole table. The average value or unique value count are an example of data quality indicators. In the DataProfiler application these indicators are called Statistics.
To describe Distribution Profiling consider an example using a table of customer addresses. There is a need to find out how many invalid addresses there are in the table. Typically, a street address should be at least 10 characters long. Addresses shorter than that are most likely incorrect. To estimate, it would be helpful to see how many records there are where the address length is 1, 2 or 3 characters, etc. This information can be plotted as a graph with address length on the horizontal axis and record count on a vertical axis. One or more data points can be selected from a graph with the ability to drill down to the original data records. Generally speaking distribution can be plotted for any numeric statistic vs. any non-aggregate expression.
CorrectAddress Profiling refers to address standardization and the analysis of a data source to determine whether the address information it contains is correctable and if there are duplicate addresses.
Duplicates Profiling allows you to estimate thenumber of potential duplicates in the data using IST's fuzzy searching and matching algorithms. The job does not perform
actual deduplication, however, estimates the number of duplicates. Actual deduplication can be performed using MerlinMerge SpeedPro.
- Easy-to-use Windows GUI that connects to a variety of
platforms
and data sources. - Flexible data sampling.
- Fast execution - millions of records can be profiled in seconds.
- No limitations on the number or the size of files or database tables.
- A variety of pre-defined, standard data metrics.
- Basic statistics and counts.
- Ability to extend and create new data metrics and expressions.
- Centralized repository for results and job definitions.
- Profiling history storage.
- Extensive reporting.
- Scalable histrograms and linear graphs.
- Drill-down capabilities within data views or within the
graphical profiles. - Data distribution analysis.
With only a few mouse clicks both technical and non-technical users can gather valuable data metrics. Further insight into problem areas can be gained by using drill-down capabilities.
Increases ROI on projects and applications constrained by bad data
Data collection has redefined the way organizations do business. Many applications and business processes fail because of bad data. These issues can be identified and most of them corrected early in project cycles.
Saves time and money spent manually checking
Users can perform complex operations in seconds, and technical resources can focus on other initiatives rather than time-consuming and error-prone calculations.
Visualizes data for easy business rule discovery
Once data characteristics and problems are identified it is much easier
to define business rules and determine an appropriate course of action.
Identifies required data quality procedures
This step is essential in understanding the data before establishing
necessary data quality processes.
Assists in creating and complying with industry and enterprise standards
A product such as the DataProfiler can contribute to improving and standardizing data and data quality routines enterprise-wide. It also enables companies to create and implement standard industry practices.
>>Page Up<<



Data Profiler® Features— Data Profiling, Data Assessment, Data Discovery