Missing data management

In statistics, imputation is the process of replacing missing data with substituted values. Incomplete, incorrect, inaccurate and irrelevant data are replaced, modified, or deleted. This is also known as data cleansing or data cleaning.

Where the data are missing in the series plays an important role in the calculations. Data may be missing at the beginning of the data set, interspersed among the other data, or at the end of the data set.

The framework offers multiple strategies for managing missing data:

  • Linear interpolation derives an estimated value of the missing data in the series.
  • K-nearest neighbor finds the nearest neighbors to a missing datum, identifies the majority value represented by the neighbors, and fills in that value for the missing datum.
  • Aggregation strategies: ignore series and ignore point

Analytics applies the missing data strategy based on the method for filling in the missing data that you use. Analytics does not go back and update the history records themselves with the missing data.