Task: Clean Data
Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean subsets of the data, the insertion of suitable defaults, or more ambitious techniques such as the estimation of missing data by modeling.
Purpose
Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean subsets of the data, the insertion of suitable defaults, or more ambitious techniques such as the estimation of missing data by modeling.
Relationships
RolesPrimary Performer: Additional Performers:
Process Usage
Key Considerations

Document what decisions and actions were taken to address the data quality problems reported during the Verify Data Quality task of the Understanding Data activity. Transformations of the data for cleaning purposes and the possible impact on the analysis results should be considered. Consider also the following questions when creating your documentation:

What types of noise occurred in the data?

What approaches did you use to remove the noise? Which techniques were successful?

Are there any cases or attributes that could not be salvaged? Be sure to note data excluded due to noise.