![]()
|
About Data Cleansing
Data cleansing is an algorithm that searches through data and looks for “outliers.”
Note: It is possible to alter good data and miss filtering some bad data points using Data Cleansing. Remember that data cleansing is an approximation.
- An “outlier” is a data value that is far apart from the rest of the data; an extreme value either much lower or much higher than the rest of the values in the data set. Outliers are known to skew means or averages. This doesn’t mean that the data point is necessarily bad but in most cases the information is more helpful without the inclusion of this “unusual” data.
Data cleansing is available on all reports and includes the following parameters, as shown in :
- Data Cleansing check box
Data cleansing is disabled by default. Select the check box to enable data cleansing. Clear the check box to disable data cleansing. When data cleansing is enabled, the following two parameters are available and allow you to specify the “intensity” of the search for outliers in the data.
- Window
Enter an integer in the Window field to define the number of surrounding data points to consider when determining whether a given point is an outlier. For example, if you use the default value of “4”, it will look at the two points before and after the point under investigation (PUI). This is a surrounding “window” of 4 points from which a standard deviation will be calculated and used with the percentage parameter, as described, below.
- Percentage
Enter a value in this field to specify the percent of standard deviation (calculated from the window of points) to apply for identifying whether or not the PUI should be considered a valid value (not an outlier). If the PUI falls outside of this valid range, then it is considered to be an outlier and its value is replaced by the linear interpolation of the surrounding 2 valid points. If the PUI falls within the range, then the data point is used and considered valid.
Figure 3-52 Data Cleansing Fields
![]()
|
|
|
|