Machine learning method to identify residential PV adopters, reduce soft costs

Researchers have defined a new machine learning-based methodology that reportedly reduces customer acquisition costs by about 15% or $0.07/Watt. It is based on an adapted version of the XGBoost algorithm and considers factors such as summer bills, household income, and homeowner’s age, among others.

May 9, 2023 Emiliano Bellini

New market opportunities to PV companies relative to logistic regression

Image: Renmin University of China, scientifics reports, Creative Commons License CC BY 4.0

From pv magazine Global

An international research team has utilized a machine learning algorithm known as XGBoost (eXtreme Gradient Boosting) to predict PV adoption among homeowners. This algorithm consists of a distributed gradient-boosted decision tree (GBDT) machine learning library that can help accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.

“We further dive into the modeling detail of XGBoost and decompose its enhanced prediction performance over logistic regression into two factors: variable interaction and nonlinearity,” the scientists said. “We last show the potential of XGBoost in reducing customer acquisition costs, and then the ability to identify new market opportunities for PV companies.”

According to them, this new methodology could help solar companies lower customer acquisition costs and other soft costs associated with the residential PV business.

They compared the performance of the proposed algorithm with the logistic regression approach, which the researchers described as the most commonly used method to analyze differences between PV adopters and non-adopters. “Our logistic regression model with nine original and highly visible household features successfully predicts 71% of out-of-sample PV adoption statuses,” they further explained. “The model correctly identified 66% of adopters and 75% of non-adopters.”

The adapted algorithm, according to the research group, was able to offer better results than the logistic regression in predictive performance. “The predictive model correctly predicted 87% of the two PV adoption statuses, compared to 71% for logistic regression,” they added. “The correct adopter rate increased from 66 to 87% and the correct non-adopter rate increased from 75 to 88%.”

They attributed the superior performance of the machine learning-based approach to the fact that it integrates complex nonlinearity and variable interaction and considers factors such as summer bills, household income, and homeowner’s age, among others.

“The advantage of using these variables is that they are highly accessible so that PV companies can collect data on them with little cost,” they also note. “Another reason to explain the improved performance of XGBoost is that it can potentially recover key latent information embedded in the data. “For example, including geographical information such as the state or county of the respondent increases the prediction accuracy of logistic regression to some extent.”

The research group estimated that the novel methodology can help PV companies reduce customer acquisition costs by about 15% or $0.07/Watt. It also explained that data mining and machine learning could also help reduce soft costs for contract cancellation, supply chain management, labor assignment, and permitting and inspection issues.

It described the new methodology in the study “Machine learning reduces soft costs for residential solar photovoltaics,” published in scientific reports. The research group is formed by scientists coming from the US Department of Energy’s National Renewable Energy Laboratory (NREL), the Lawrence Berkeley National Laboratory, the Florida State University, the University of Wisconsin-Madison, and the Renmin University of China.

This content is protected by copyright and may not be reused. If you want to cooperate with us and would like to reuse some of our content, please contact: editors@pv-magazine.com.