HVAC systems are great. They let us to enjoy ourselves in a comfortable temperature and quality air anytime. But, this luxury comes at a great price – a price that can include heating unoccupied rooms, lost heat or distribution losses, not to mention the negative effects on our environment.

In order to feel comfortable, you have to do just one thing: make sure that the temperature in the room is optimal while there are people in it. That is it! What “optimal temperature” means is up for you to decide. Helpful tip from the US government: if you’re in a mild climate, dialing back the HVAC system by 4ºC (8ºF) can help save 10% of energy annually.

Just imagine how much energy can be saved and reused or resold by an energy provider if we used the HVAC system only when needed!

SmartCat needed to solve this problem, and we did it using Reinforcement Learning. Let’s talk about our approach!

As always, it all starts with a good dataset and the power of machine learning! We have received a large amount of data on indoor and outdoor temperature, and energy spent on HVAC devices collected over the course of a few years. This data came from industrial buildings. Our algorithm showed that we could use around 30% less energy just by relying on occupation time and buildings’ insulations!

As we did not have data on each building’s insulation, we used the data to learn about it. Let’s say it’s a chilly day in November – 15ºC (60ºF). We’ve got two buildings in the neighborhood – both are work spaces and everyone went home. There’s building A, which cools down only 1ºC per hour (I mean, it is November). And there’s also building B, which cools down 5ºC per hour. It’s pretty safe to say that building A has much better insulation than B, because it cools down slower. There’s a lot to learn from such examples! And we did. We created regression models for each building, and we used them as thermal models in further calculations.

When talking about Reinforcement Learning, there are two entities that you should keep in mind – an agent and an environment. The goal of a reinforcement learning task is to find an optimal policy that an agent can use to properly behave in an environment. If we were to translate this to our use case, our goal would be to create a controller (agent) that would properly control the indoor temperature of buildings (environment) using a set of optimal rules for turning heating/cooling devices on or off (policy).

We’ve used previously created thermal models as environment simulators. The simulators would predict the temperature in the next timestamp based on current parameters. Now, our agent would need to find an optimal policy. An agent that does not have a policy would behave arbitrarily. We can make an agent behave so it gives us a plausible outcome by enforcing its actions using a reward function. We do this by taking all limitations into account when devising a reward function. In our case, there are only a few things that we would like to consider: time of day, the desired temperature range and a day of the week.

Here’s how we trained the agent Using our software, the person in charge of cutting back on emissions can specify something like this: “The company works Monday to Friday from 9 a.m. to 5 p.m. We’d like our temperature to be between 22-26ºC (70-80ºF) during that time.” It is now the job of our algorithm to find the optimal policy that will control the temperature.

And here’s how it would do it: The algorithm fetches the thermal model of the building (a model that describes building’s insulation), downloads predicted hourly temperatures for the day and, starts exploring the environment. But! There’s more rules to the game:

The algorithm would reward our agent when and if the temperature is between the desired range during office hours. The algorithm would also return higher rewards when an agent spends less energy. So, that’s how we taught our agent to behave. To sum up: the previous approach was to keep the HVAC devices powered on during office hours. Our new approach would try to rely on the building’s insulation as far as possible to preserve energy.

A funny thing happened: the algorithm got smarter in ways we didn’t design It started to warm up in advance We noticed that an agent had learned to turn on the air conditioning devices even before office hours, so when employees arrive, the indoor temperature is already set between the desired limits.

It would quickly heat up, then let the environment cool itself down We also learned that during this period, the algorithm would heat the room to the higher temperature limit (in our example it was 26ºC / 80ºF) and then turn off the devices and allow the temperature to drop to the lower temperature limit (in our example, it was 22 degrees) before turning it on again.

This way, our algorithm saved ~30% more energy. And the best part? No new hardware necessary.

While there are many air conditioning systems nowadays that can automatically control the temperature using the built-in sensors, many buildings do not use them yet. This is why solutions like this one can be very cost-effective and beneficial as they don’t require expensive hardware changes.

We are excited to integrate our service and see how it’s behaving in a real system, so stay tuned!