Data Science Talent Logo
Call Now

Measuring the Effect of Change on Physical Assets By Colin Parry

 width=Colin Parry is the founder of Head for Data, a data consultancy based in Scotland. Prior to this he was Director of Data Science at arbnco, where the work described below was developed. Colin has worked in the energy field for 14 years using data to drive innovation and improve business outcomes. He holds four patents in the area of energy management in buildings.
In this post, Colin explains how he used data science to forecast the energy consumption of buildings. Measuring a building’s consumption has traditionally been rife with challenges, but Colin explains how he leveraged convolutional neural networks (CNNs) to create an innovative and accurate forecasting model:


How do you measure the effect of a change?

Ask a data scientist this question and they are likely to think of A/B testing.

An A/B test is a form of randomised control experiment in which multiple measurements are taken of a metric, but this is acquired from two different groups with a slight variation in user experience. For example, a website might measure click-through rate whilst slightly varying elements on a page, showing one group a red button and another group a green one. A drug trial might measure the recovery rate from illness whilst slightly varying the medication given to each group, giving one group a tablet with an active ingredient and another group a placebo.

This is now an industry standard way of measuring the effectiveness of a change. But there are some implicit assumptions:

  1. It must be physically possible to get samples from multiple groups simultaneously.
  2. The marginal cost of acquiring new samples cannot be prohibitive.
  3. Each sample must be comparable inside the group and across groups.

Serving up a new website layout only requires a few configuration changes and the cost of acquiring multiple groups is negligible, therefore this is an easy experiment to run. In the case of a drug trial, this is trickier as more people are required to participate so the marginal cost is higher, resulting in generally smaller sample sizes, but still enough to make the results meaningful. In both examples assumptions about the people involved need to be made – a website may segment visitors by interests or affluence to ensure the groups are comparable. The drug trial may limit participation to those with a certain BMI or age to reduce the effect of these latent variables on the results.

What happens, though, if the thing being measured breaks all three of these rules? What if the measurement takes place on an asset that is one-of-a-kind, and it costs thousands of dollars to make a change?


An oft-quoted statistic in building science is that 80% of the buildings that will be occupied in 2050 already exist today[i]. Globally, buildings account for 40% of our energy consumption and 33% of our greenhouse gas emissions[ii] Unlike areas such as transport and technology, reducing the energy consumption from the buildings we all live and work in cannot be primarily achieved by building cleaner and more efficient products. Clearly, new buildings will be more efficient than those already built, but for the large stock of already constructed buildings we need to rely on retrofits.

A retrofit to a building is anything that changes something already existent in the building. Switching to LED lightbulbs, adding insulation and changing out a HVAC or boiler to a more efficient alternative are all examples of retrofits aimed at reducing energy consumption. As well as the environmental benefit these retrofits are cost-effective in the medium to long term, but do not bring immediate benefit to the building operator due to the often high upfront cost.

The traditional way to model energy consumption in buildings and assess how beneficial a retrofit might be, is through simulation. Software such as EnergyPlus[iii] is a physics-based engine that gives a very good picture of how the energy consumption of a building will vary over time. The downside is it needs a lot of accurate information about the building’s construction materials, internal layouts and usage patterns.

For a building in development, this will be known from the plans, but very few existing buildings (and it only gets worse the older the building) have plans sufficiently detailed to build a simulated model. Acquiring this information requires intrusive surveys and even then, there is still a lot of guesswork around materials, particularly in areas not visible to the eye. Translating this into model parameters requires experienced building scientists and is a time-consuming process that cannot be used at scale.


How does this involve data science? Let’s break down the problem.

Building operators want to reduce the energy consumption of their buildings, primarily to save money. Applying retrofits is expensive and it takes time to measure the effect, and since most buildings are unique, we cannot simply run one building with the change and one without. Creating a simulation of the change is timeconsuming – and in many cases impossible – without specialist knowledge, and anyone responsible for more than a handful of buildings cannot get this done at scale.

Therefore, if we could build a data science model that could approximate the physics-based simulated engine then this could be trained on buildings and used to estimate the normal operation before a retrofit is applied, and compared to the actual performance post-retrofit.

A building typically has two main factors that affect the energy consumption: weather and human activity:

It is intuitive that extremes in weather require the building to be heated or cooled to make it habitable for the occupants. Where the building is situated will affect how exposed it is to weather; factors like whether the building is detached or terraced, has areas underground or is sheltered from the prevailing winds, will be important. Additionally, the level and frequency of human occupancy will affect energy intensity – a building used for 24 hours will require more energy than one used for 8 hours. The types of equipment used by human activity will also affect consumption, so building type (such as office or factory) will impact the way that the same occupancy pattern changes the energy used.


If we think about what we want to achieve here – we know weather and human occupancy patterns and want to forecast energy consumption – the model is effectively learning the transfer function of how the building reacts to these external factors, and this is clearly unique to each building.

It might be tempting to look at this as a problem requiring a universal solution. Maybe that would work, but it would need to factor in building type, age of construction and other hard-to-obtain information. The more of this information that is required for the model to work, the higher the chances of not being able to predict for a building if it is not available, and gets closer to requiring a physics-based solution. It is far simpler to train the model on one building at a time.

Since this model does need to do real-time forecasting, we can afford to wait minutes for training time, and therefore instantaneous response is unnecessary. Now we have defined the parameters, let us dive into the data used for training and prediction.


There’s a lot to consider when it comes to which weather data to use. This data can be acquired inexpensively from weather APIs and have a dizzying array of parameters ranging from temperature and pressure to less useful ones such as moon phase and ozone level. For this model, it was found that only air temperature and relative humidity were required to get good predictive power.

Whilst historical data is relatively easy to come by, for this use case the weather data must be forecasted into the future. Again, many weather APIs can forecast months ahead and this is how the data for this part of the model was obtained – more on this at the end.

The data on human occupancy would only be available if sensors were used to obtain it, and this is very unlikely to be available for most buildings. Therefore, a clustering methodology was applied which identifies periods of similar usage and creates normalised patterns for each day. The exact way this is determined is beyond the scope of this article but is described in another patent WO2022023454A1.

Once these load patterns have been calculated for historical data, they can be applied onto the future predictions by making intelligent deductions about building energy consumption patterns. If multiple years of data are available as part of the training dataset then the most likely usage case can be estimated, and where only one year of data is available then knowing the approximate building type can be used to estimate the typical annual pattern. For example, an office is likely to operate year-round but with known dips around public holidays, whilst a university will have drops in consumption when students are off.


When it comes to time-series forecasting there are a plethora of approaches to choose from: all the way from univariate statistical models such as ARIMA through to deep learning approaches such as RNNs and LSTMs.

In this case, while the chosen architecture requires online training, we still need to be mindful of performance and costs as the data volume scales. Unfortunately, due to the way sequences are handled in RNN and LSTM architectures, they can be very slow to train. The solution is to use another type of deep learning architecture – convolutional neural networks (CNNs).

Normally, CNN architectures are used for tasks such as image recognition, and work by using hidden layers that modify the input data by applying a short filter window to it to reduce the size of the resulting output. For a long period of time-series data, such as in our problem, this would result in an enormous number of hidden neurons.

The model is fed a series of frames of shape Ntr × Mf where N tr is the number of training rows in a frame and Mf is the number of features. The model learns from this to predict the energy consumption a small window ahead for each frame of size N pr × 1 where N pr is the number of predicted rows. Making decisions about the length of training data in a frame and how far to predict ahead is largely a balancing act of accuracy vs training performance. Making the frame window too long will reduce the number of frames to train on, but will allow the model to go further back to learn from – however there is a limit to causality (i.e. did the temperature 6 months ago really affect the energy consumption yesterday?). Making the frame window too short will not give the model enough data to learn from.

To reduce the number of neurons required in the hidden layers, we can use a dilated-CNN architecture which skips out historical points in the frame of the window. This vastly reduces the size of the model, results in a shorter and less computationally intensive training time and has the bonus effect of reducing overfitting. Applying p 1D convolutions to the input data reduces the neurons down to 2p the size of the input dataset.

Other commonly used deep learning techniques were applied, such as skipping connections (to reduce the chance of the network becoming stuck at a local minimum by connecting non-adjacent layers) and dropout regularisation (to further reduce overfitting by randomly disabling neurons during training).


The model was evaluated against a database of buildings that had multiple years of data so that the model could learn from a portion of the data and be evaluated against the ‘future’ real data. The goal was to meet the entire building calibration standard as defined by the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE). In Guideline 14, they state that a building can be considered calibrated if the Normalised Mean Bias Error (NMBE) is less than 10% and the Coefficient of Variation of the Root Mean Square Error (CV(RMSE)) is less than 30%. Using this test database of buildings, this algorithm exceeded these standards when evaluating the 6-month ahead prediction.

The main strength of this model is that no prior knowledge of the seasonality of the energy profile is required, since the weather features cover the low frequency changes, and the human occupancy gives the model information about the high frequency changes. Therefore, predictions can be made on a wide range of building types with diverse operating patterns without any real knowledge of the building itself. This is the main benefit of not attempting to build a universal solution to this problem.


One of the model features is human occupancy. As discussed, this is very rarely directly measured in a building and must be inferred from the energy data. Since the approach here normalises patterns, then the more astute reader might ask why these are not used for prediction. Whilst this would be a very simple and easy way to produce an output, the normalisation approach removes the effect of the weather features and usage changes, therefore this does not work for buildings with a high degree of seasonality caused by weather or sudden changes in usage.

The other features come from a weather API. These generally source this data from weather stations and these are very unlikely to be sourced directly at the building itself. Therefore, effects such as urban heat islanding[iv] can result in the measured weather data not reflecting the true values experienced at the building. Fortunately, the weather API will return the station coordinates so these can be compared to the building location, but in some rare cases, if the weather station is too far away, then caution should be applied to the accuracy of the results.

When considering the reliability of these predictions, sudden changes in use case can break the model forecast. For example, if a factory purchases a new piece of equipment, the energy profile can change quite dramatically. This can be accounted for in the historical data by training separate models for each usage case (assuming there is enough data) but if this takes place after the retrofit is applied, then this previously unseen profile could render these predictions invalid.

As discussed earlier when evaluating a potential retrofit, future weather forecasts need to be obtained in order to make a comparison. However, if this forecast ends up being incorrect (i.e. the real weather ends up being colder or warmer than at the time of the comparison) this could make the initial comparison under or over-estimate the benefit of the retrofit. It is recommended that once the real weather data has been recorded, a retrospective comparison is carried out.


To assess how energy consumption will change when a retrofit is applied to a building we first need to capture how the building operates currently, and this can be compared to the actual performance after the retrofit is applied. A/B testing cannot be used because there is only one copy of the building, and traditional physics-based approaches often require information that can only be obtained through intrusive physical surveys which do not scale well to a large portfolio of buildings.

By using a dilated-CNN architecture to learn the transfer function we can use temperature, humidity and human occupancy as features to learn energy consumption on a historical training dataset of at least 12 months in length. After a retrofit has been applied, this can be used to estimate the energy consumption that would have taken place without it, and therefore a comparison can be made to determine whether the change was successful.

This work was developed by arbnco and is patented worldwide under patent WO2022043403A1.

[i] UK Green Building Council, Climate Change Mitigation, Visited 16th November 2023,

[ii] World Economic Forum, Why buildings are the foundation of an energy-efficient future, Visited 16th November 2023, efficient-energy-ecosystem

[iii] EnergyPlus, EnergyPlus, Visited 20th November 2023,

[iv] National Geographic, Urban Heat Island, Visited 29th November 2023,

Back to blogs
Share this:
© Data Science Talent Ltd, 2024. All Rights Reserved.