Task: Verify Data Quality
Examine the quality of the data, addressing questions such as: Is the data complete (does it cover all the cases required)? Is it correct, or does it contain errors and, if there are errors, how common are they? Are there missing values in the data? If so, how are they represented, where do they occur, and how common are they?

Purpose
  • Examine the quality of the data, addressing questions such as: Is the data complete (does it cover all the cases required)? Is it correct, or does it contain errors and, if there are errors, how common are they? Are there missing values in the data? If so, how are they represented, where do they occur, and how common are they?


Relationships
RolesPrimary Performer: Additional Performers:
Process Usage
Main Description

Data are rarely perfect. In fact, most data contain coding errors, missing values, or other types of inconsistencies that make analysis tricky at times. One way to avoid potential pitfalls is to conduct a thorough quality analysis of available data before modeling.

The reporting tools in IBM® SPSS® Modeler (such as the Data Audit, Table and other output nodes) can help you look for the following types of problems:

  • Missing data include values that are blank or coded as a non-response (such as $null$, ?, or 999).
  • Data errors are usually typographical errors made in entering the data.
  • Measurement errors include data that are entered correctly but are based on an incorrect measurement scheme.
  • Coding inconsistencies typically involve nonstandard units of measurement or value inconsistencies, such as the use of both M and male for gender.
  • Bad metadata include mismatches between the apparent meaning of a field and the meaning stated in a field name or definition.