Sales volume is a metric sometimes used to assess the quality and depth of the market, or the level of demand. In the case of the Land Registry Price Paid dataset it is complicated by the delayed registration of some transactions, especially new properties which may be batched by the developer, or other delays.

The backfilling of data is also relevant to the performance indices. The front month generally has non-zero observations on only a small number of repeat sales - properties sold previously and then resold this month. Logically in a steady flow state, this number would be only 1/holding period, so if the holding period is on average 4 years say, only 1/48 of the rows are non-zero. It is problematic if this number is cut further by delayed registration.

Figure 1: Sales volume

It is worth first taking a quick look at the general patterns in volume. The first chart shows the timeseries of registrations for 5 years to end 2018, as recorded mid-2019. It shows some seasonality and a spike in 2016 when stamp duty was changed especially on buy to let, resulting in some anomalously heavy volume in March.

Figure 2: Seasonal volume

Since there is not much of a trend, we can estimate seasonality by simply taking median volume by month over the last 5 years, shown in the next chart. The impact of the single March outlier is reduced when using the median. The seasonality is weaker than one might suppose, and does not truly dip in the 4th quarter, instead dropping fairly sharply in January.

volume(tau) related to volume(tau=6)

Figure 3: volume(tau) related to volume(tau=6)

Having duly looked at the data, we arrive at the main point, which is to see how much backfilling occurs, and assess its impact on the volume statistic. In order to do this we look at the dataset released each month which we have stored in real ‘point in time’ form for the last 3 years. The chart shows the fraction of the ‘final’ volume for a given month which is recorded in the first month (the reporting delay is 1 month but we ignore that in the notation here). So at lag 0, less than half the sales are recorded but most of the backfill occurs in the next month, lag 1. By lag 6 we are close to the final figure. We also show the correlation of the short-lagged figure with the long-lag figure, and from this can infer that the front month (lag 0) volume is not good for predicting the final volume, but waiting just one month gives a much better result.

Conclusion: be very wary about using the ‘front month’, so now in September we have in theory got sales for July - they were released end August - but for many purposes it would be better to ignore them and simply use data up to end June.