Task: Verify Data Quality |
|
 |
Examine the quality of the data, addressing questions such as: Is the data complete (does it cover all the cases required)? Is it correct, or does it contain errors and, if there are errors, how common are they? Are there missing values in the data? If so, how are they represented, where do they occur, and how common are they?
|
|
Purpose
-
Examine the quality of the data, addressing questions such as: Is the data complete (does it cover all the cases
required)? Is it correct, or does it contain errors and, if there are errors, how common are they? Are there
missing values in the data? If so, how are they represented, where do they occur, and how common are they?
|
Relationships
Roles | Primary Performer:
| Additional Performers:
|
Process Usage |
|
Main Description
Data are rarely perfect. In fact, most data contain coding errors, missing values, or other types of inconsistencies that make analysis tricky at times. One way to avoid potential pitfalls is to conduct a thorough quality analysis of available data before modeling.
The reporting tools in IBM® SPSS® Modeler (such as the Data Audit, Table and other output nodes) can help you look for the following types of problems:
- Missing data include values that are blank or coded as a non-response (such as $null$, ?, or 999).
- Data errors are usually typographical errors made in entering the data.
- Measurement errors include data that are entered correctly but are based on an incorrect measurement scheme.
- Coding inconsistencies typically involve nonstandard units of measurement or value inconsistencies, such as the use of both M and male for gender.
- Bad metadata include mismatches between the apparent meaning of a field and the meaning stated in a field name or definition.
|
Licensed Materials - Property of IBM. (c) Copyright IBM Corp. 2015.
IBM, the IBM logo, and SPSS are trademarks of International Business Machines Corp,
registered in many jurisdictions worldwide. Other products and service names may be trademarks of IBM or
other companies. You may use the Content 'AS IS" or modify them, however IBM will not be responsible for
any deficiencies or errors that result from modifications that you make.
|
|
|