Measuring smart thermostat energy impact and user experience

Measuring smart thermostat energy impact and user experience


Feb 2017 - Oct 2020

My main focus while I was a Data Scientist at Uplight (formerly Tendril) was measuring the impact of the Orchestrated Energy (OE) product. OE was an AI algorithm that operated on users’ smart thermostats to optimize their schedules. Users would enroll in OE via their energy utility, and the utility would determine the goal of the optimization - overall energy savings, shifting energy out of peak periods of the day, or minimizing energy use during extreme peak periods where the grid was at risk. The goal was that the program would learn the properties of individual houses and set the appropriate schedule without the users noticing a difference in their comfort. My job was to measure both the impact on energy use and the impact on users.

Utilities are highly regulated in most parts of the US, and these regulations require them to measure and report the impact of programs like this. Since OE was an unusual program for the evaluators, we worked closely with them to make sure they understood the nuances of the program and experimental design.


  • Implement experimental designs that can accurately measure the energy impact of the program while maximizing that impact
  • Measure and report energy impact results using data from the smart thermostats
  • Assist with researching the user experience and make recommendations for improvements and priorities to the wider product team

Process at a glance

Design Experiment

Power analyses

Within-subjects study with randomized control days

Feasibility and implementation

Analyze energy impact

Equivalence tests

Fixed-effects linear regression

Data visualizations

Research user impact

User surveys

User interviews

Exploratory data analysis

Communicate results

Presentations and formal reports

Third-party evaluations

Poster presentation at energy efficiency conference

Design Experiment

The experimental design for this product was complicated by the requirement that everyone who signs up for the program should participate in the program as long as they are eligible. Because of this, we couldn’t do a simple randomized control group. There were also different possible goals and metrics that needed to be studied, which often overlapped and combined in various ways. The experimental designs, therefore, were fairly complicated to communicate to the internal team, client, and evaluator, as well as to implement and monitor in the software.

For overall energy savings, we implemented an on-day/off-day design, where we randomly interspersed days without any optimization. The users’ thermostats just ran their normal schedules on these days, and we were able to use this data to predict baseline energy use and compare it to the data from days when OE was active. Throughout the years, the logic that randomly assigned these off-days changed a bit to account for carryover effects that we saw in the data, but the general idea remained intact.

On days with extreme weather, utilities would call a demand response (DR) event, where OE would do everything it could to not run the air conditioner or heater during peak hours, while still attempting to keep the user comfortable in their home. This was the main draw of OE for utilities because shifting residential energy use out of these extreme peak hours could mean the difference between having to build a new power plant or risking outages. Some utilities wanted to make sure every possible user was included in these events, even if it meant a less accurate measurement. To account for this, we allowed for two possible experimental designs on DR days. If the utility allowed it, we would randomly assign a control group for each event to serve as the baseline, otherwise, we would model a baseline from non-DR day data. The latter approach was not ideal because the baseline data would be coming from days that are fundamentally different from DR days so would introduce bias. I worked with utilities and their evaluators before each season to make sure they understood the tradeoffs of each approach.

Below is an example of a program’s experimental design, which measures both energy efficiency (EE) savings and DR results using the preferred design:


Analyze Energy Impact

Once the experiment had been running for a few weeks, I would run various fixed-effects regression models and report on the energy impact. During each season, I would provide results regularly to the team so they could make any necessary adjustments. Then at the end of the season, we would do more in-depth analyses to understand the full impact. In later years, the models were automated and results populated internal and external dashboards, so I was more involved in maintaining the software and interpreting any irregularities in the results. Below are some anonymized examples of reported results:


These quick-and-dirty visualizations eventually served as a starting-off point for the design team to construct the dashboards that utility clients used to monitor their programs and shaped the internal team’s understanding of the energy impact of OE.

Research User Impact

In addition to the impact on energy use, the promise of OE was that we could achieve those results while keeping the user comfortable in their home. To understand whether this was really the case we used several different research approaches:

  • Regular user surveys asking about various aspects of the program
  • In-depth user interviews
  • Quantitative analysis of behaviors captured by the thermostat (e.g. how often did users adjust the temperature when OE was active vs. when it was not)

Our survey approach changed a bit over the years, but generally, we tried to reach users after each DR event so the experience was fresh in their minds. These surveys were short (5-10 questions) since they would be sent out several times throughout the season. We also sent out longer surveys at the end of each season to get a better understanding of the overall experience. We saw response rates and completion rates well above industry benchmarks for both kinds of surveys. Some of the topics covered were:

  • overall satisfaction with the program
  • level of understanding of the program
  • comfort in the home
  • frustrations and pain points
  • interest in future programs or new features
  • open-ended feedback

Below are some sample questions and results, including the takeaways communicated to the larger team:


We also used the surveys to recruit participants who were willing to give us an in-depth interview about their experiences. This was an excellent opportunity to dig into the reasons people reacted the way they did and hone in on which aspects of the program they found frustrating. The survey results and quantitative analyses that I ran were instrumental in figuring out which lines of questioning to pursue, so I worked closely with the UX research team in this process, including sitting in on several sessions and serving as a notetaker.

One thing that really stood out from this research is that users didn’t necessarily want the program to just be running in the background without them noticing. A lot of people were very interested in what was going on and why their thermostat schedule was being set in certain ways. This led us to modify our strategies for communicating the details of the program to the user and the way we talked about its overall value proposition.

Communicate Results

There were many points of communication throughout this process, both with the internal product and engineering teams and with the stakeholders at the client utilities. These programs had a lot of small details and nuances that were very important to communicate effectively. Over the years, I got much better at visualizing the information I was trying to convey, and understanding the level of technical detail that was required for various audiences. I very quickly became the subject matter expert in how OE was measured and repeatedly received very positive feedback from the wider team.

I also co-wrote a paper and presented a poster at an energy efficiency conference, which was used as a marketing tool (see blog post) for the product and the company as a whole.

Reflect and Learn

  • This is one of the projects that has allowed me to grow the most and have the greatest impact in my career so far. Over the years, I was able to know the product inside and out and serve as an invaluable resource for the team. I learned a ton about both the technical side as well as crucial skills in collaboration and communication that I’m sure will serve me throughout my career.
  • One of the most interesting aspects of this project to me was the intersection of quantitative and qualitative research. As the data scientist on the team, most of my time was spent on the quantitative side of things, which was extremely useful in understanding precisely what impact the product was having. But I really enjoyed dipping into the qualitative side of the research to get at the why behind those numbers. Reflecting on this experience was a huge part of my decision to transition from data science to product design.
  • This product went through a lot of pivots and restructuring of priorities. Along with the larger team, I learned a lot about how to question our assumptions about what our customers and users really wanted, and how to adjust our offering to make it as successful as possible.
  • We also had to think very hard about the different needs we were serving. Our end users were the people living in the houses and being comfortable or uncomfortable. On the other hand, our paying customers were the utilities who wanted to see the best results possible. These interests were often at odds with each other and the team had to make tough decisions about what to prioritize. Being thorough and objective in the research process was crucial to these decisions.
  • I’m very proud of the work I did on this project, and could not have done it without the phenomenal team that I worked with.

Back to home page