TY - JOUR
T1 - An interpretable machine learning model for predicting forest fire danger based on Bayesian optimization
AU - Liu, Zhiyang
AU - Zhou, Kuibin
AU - Yao, Qichao
AU - Reszka, Pedro
N1 - Publisher Copyright:
© The Author(s).
PY - 2024
Y1 - 2024
N2 - As global warming increases forest fire frequency, early prevention and effective management become crucial. This requires models that are both accurate and easily understood. However, traditional machine learning models, which typically use preset parameters, are often inaccurate and hard to interpret. Therefore, this study introduces an enhanced approach using data from 2000 to 2019 in the Sichuan and Yunnan provinces of China, incorporating 18 driving factors. Bayesian optimization algorithms, i.e., the Gaussian Process (GP) and Tree-structured Parzen Estimator (TPE) probabilistic proxy models, were used to optimize the hyperparameters for LightGBM, Random Forest (RF), and Support Vector Machine (SVM), respectively. Finally, forest fire danger prediction models were constructed to draw forest fire danger maps, and the performance was compared between different models. In detail, the model's predictive performance was evaluated using metrics like accuracy, recall, precision, Balanced F Score (F1), and area under curve (AUC). The evaluation demonstrated that the TPE-LightGBM exhibited remarkable accuracy (AUC = 0.962). The forest fire danger map categorizes the study area into five danger levels. The TPE-LightGBM effectively classifies 62.58% of the study area as low-danger level and 5.33% as high-danger Level V. The Shapley additive explanation (SHAP) model interpretation of TPE-LightGBM highlights daily the average relative humidity, sunshine hours, elevation, daily average air pressure, and daily maximum ground surface temperature as the primary influential factors, followed by the human activity indexed by the gross domestic product (GDP) and the distance to the nearest railway.
AB - As global warming increases forest fire frequency, early prevention and effective management become crucial. This requires models that are both accurate and easily understood. However, traditional machine learning models, which typically use preset parameters, are often inaccurate and hard to interpret. Therefore, this study introduces an enhanced approach using data from 2000 to 2019 in the Sichuan and Yunnan provinces of China, incorporating 18 driving factors. Bayesian optimization algorithms, i.e., the Gaussian Process (GP) and Tree-structured Parzen Estimator (TPE) probabilistic proxy models, were used to optimize the hyperparameters for LightGBM, Random Forest (RF), and Support Vector Machine (SVM), respectively. Finally, forest fire danger prediction models were constructed to draw forest fire danger maps, and the performance was compared between different models. In detail, the model's predictive performance was evaluated using metrics like accuracy, recall, precision, Balanced F Score (F1), and area under curve (AUC). The evaluation demonstrated that the TPE-LightGBM exhibited remarkable accuracy (AUC = 0.962). The forest fire danger map categorizes the study area into five danger levels. The TPE-LightGBM effectively classifies 62.58% of the study area as low-danger level and 5.33% as high-danger Level V. The Shapley additive explanation (SHAP) model interpretation of TPE-LightGBM highlights daily the average relative humidity, sunshine hours, elevation, daily average air pressure, and daily maximum ground surface temperature as the primary influential factors, followed by the human activity indexed by the gross domestic product (GDP) and the distance to the nearest railway.
UR - http://www.scopus.com/inward/record.url?scp=85213843864&partnerID=8YFLogxK
U2 - 10.48130/emst-0024-0026
DO - 10.48130/emst-0024-0026
M3 - Article
AN - SCOPUS:85213843864
SN - 2832-448X
VL - 4
JO - Emergency Management Science and Technology
JF - Emergency Management Science and Technology
M1 - e025
ER -