1. Introduction

There are 500 power consumption profiles from 500 households (units) spanning 122 days (Aug 1, 2017 to Nov 30, 2017). Clustering and regression for household power consumption profiles are conducted.

First five observations of first five households are previewed. Physical unit of entries is not mentioned in the data set, and is presumed to be kiloWatt (kW). The physical unit only matters in interpretation because forecasted values are more or less the same. If it is electricity indeed, do:

  • multiplication with 2 to get hourly electricity consumption,
  • and multiplication with 48 to get daily electricity consumption.
household 0 1 2 3 4
datetime
2017-08-01 00:00:00 0.094 0.028 0.116 0.096 0.189
2017-08-01 00:30:00 0.039 0.050 0.068 0.077 0.156
2017-08-01 01:00:00 0.088 0.060 0.044 0.095 0.118
2017-08-01 01:30:00 0.046 0.023 0.067 0.092 0.145
2017-08-01 02:00:00 0.082 0.020 0.068 0.085 0.153

Power consumption profiles of household 0, 1, 2 on Aug 1, 2017 are plotted:

Here is the hourly down-sampled averaged profile of 0 on Aug 1, 2017. It is compared to the original profile.

2. Daily Down-Sampled Averaged Profiles

Here are daily down-sampled averaged profiles of households 0 and 1 for the whole period. There are some days when there is no entries. Such missing entries are discussed in the following section.

Entries in Aug, 2017 are zoomed in.

3. Handle Missing Entries

The distribution of missing entries can be summarised as:

  • For households 162, 428 and 432, two entries in different sets of dates are missed".
  • For all the other households, 48 entries in different sets of dates are missed.
  • There are only 186 time points when data is complete.

Linear interpolation is not very useful. For example, how entries Aug 16, 2017 for household 0 are filled can be plotted. Such interpolation can distored both clustering and regression results.