|
StatDQ is an essential tool for:
The functionality of the application includes: data profiling, data cleansing (including parsing, duplicate removal, standardization), data enhancement, deployment of automated data quality processes.
The application can be used to identify, monitor and solve problems in data sources. The applications of StatDQProfilingThe application supports profiling, i.e. exploring the data in order to discover phenomena which negatively influence its quality. The software includes a collection of interactive tools which support the identification of problems and verification of data quality from business as well as technical points of view. Statistical analysis and visualization of data are also supported. Data CleansingFor the purposes of Data Cleansing, the StatDQ application supports operations like: - parsing - the splitting of one data field into two or more fields, based on the meaning of the stored information and its context (for example first and last name, area code and city, etc.),
- standardization - replacing multiple synonymous values of the same variable with a single value. For example, the values “Warszawa” and “Wa-wa” can be replaced by a single, user defined variant. The application can use its built-in standardization dictionaries; custom dictionaries based on customer data can also be created.
- duplicate removal - detects repeating records in a database and consolidates them.
Data enhancement- Merging external data sources - The application has extended capabilities of matching records from different data sources, which provides the means to merge two or more different databases. The application also supports probabilistic matching, which can merge data even when the two sources have differently defined fields used for identifying records. For instance, it is possible to match the following records from two different sources:
Source 1 | John Smith, born on 1975/01/27 | 01-515 | Solidarity Street | - | Is this the same person as: | Source 2 | Smith J. | Warsaw | SOLIDARITY | January 1, 1975
| - Adding new information from dictionaries - StatDQ can add new information to the data from the dictionaries. The application comes with a built-in dictionary of names and a Polish area code dictionary.
- Householding - the application can identify relations between customers, for example identify households or businesses, based on the information contained in the customer database.
Deployment of automated Data Quality processesThe application provides the means to attain and maintain a desired level of data quality over a longer period of time. This goal is achieved by the automation and cyclic execution of some of the Data Quality processes. Besides automation, the application also supports supervision of the implemented data cleansing process with the use of automated reporting and monitoring. The StatDQ application also provides the means for cyclic data quality control by using business rule validation, analysis of data quality indicators over time, analysis of the stability of the variables over time.
StatDQ featuresThe StatDQ application is an original solution developed by StatConsulting. This enables the tailoring of the offered product to the individual needs of the customer, as well as subsequent maintenance and development of the dedicated software. The features of the application include: scalability - the software has been tested in projects involving demographic data with over 3 million records, adaptation to Polish data environment - the software comes with a set of ready made rules and algorithms tailored to the specific features of data sets used in Poland, flexibility - the rules and algorithms used for data cleansing can be freely defined and modified by the user, - importing and exporting of data - StatDQ can import data from various relational database systems. It is also possible to import data from text files conforming to the CSV or XML standards, and to read data from spreadsheets. The application supports the viewing and editing of tables and their contents.
- reporting - the StatDQ application is capable of generating cyclic reports with the desired number of information and statistics. User-defined report templates are also supported. The application seamlessly integrates with MS Office and OpenOffice software suites.
|