TY - JOUR
T1 - Comparing Generalized Linear Models and random forest to model vascular plant species richness using LiDAR data in a natural forest in central Chile
AU - Lopatin, J.
AU - Dolos, K.
AU - Hernández, H. J.
AU - Galleguillos, M.
AU - Fassnacht, F. E.
N1 - Funding Information:
This work was partially funded by CONICYT project, Integration of Advanced Human Capital into the Academy, code 791100013 and by the U-INICIA VID 2012 , code 1/0612 , University of Chile. The authors would furthermore like to thank two anonymous reviewers for their valuable comments that helped to improve an earlier version of the manuscript. Kyle Pipkins is acknowledged for proof-reading the manuscript. Finally, we would like to thank Dr. Florian Hartig for his advice concerning the selection of the statistical test.
Publisher Copyright:
© 2015 Elsevier Inc.
PY - 2016/2/1
Y1 - 2016/2/1
N2 - Biodiversity is considered to be an essential element of the Earth system, driving important ecosystem services. However, the conservation of biodiversity in a quickly changing world is a challenging task which requires costefficient and precise monitoring systems. In the present study, the suitability of airborne discrete-return LiDAR data for the mapping of vascular plant species richness within a Sub-Mediterranean second growth native forest ecosystemwas examined. The vascular plant richness of four different layers (total, tree, shrub and herb richness) was modeled using twelve LiDAR-derived variables. As species richness values are typically count data, the corresponding asymmetry and heteroscedasticity in the error distribution has to be considered. In this context, we compared the suitability of random forest (RF) and a Generalized Linear Model (GLM) with a negative binomial error distribution. Both models were coupled with a feature selection approach to identify the most relevant LiDAR predictors and keep the models parsimonious. The results of RF and GLM agreed that the three most important predictors for all four layers were altitude above sea level, standard deviation of slope and mean canopy height. This was consistent with the preconception of LiDAR's suitability for estimating species richness,which is its capacity to capture three types of information: micro-topographical, macro-topographical and canopy structural. Generalized Linear Models showed higher performances (r2: 0.66, 0.50, 0.52, 0.50; nRMSE: 16.29%, 19.08%, 17.89%, 21.31% for total, tree, shrub and herb richness respectively) than RF (r2: 0.55, 0.33, 0.45, 0.46; nRMSE: 18.30%, 21.90%, 18.95%, 21.00% for total, tree, shrub and herb richness, respectively). Furthermore, the results of the best GLMwere more parsimonious (three predictors) and less biased than the best RFmodels (twelve predictors). We think that this is due to the mentioned non-symmetric error distribution of the species richness values, which RF is unable to properly capture. Froman ecological perspective, the predicted patterns agreedwell with the known vegetation composition of the area. We found especially high species numbers at low elevations and along riversides. In these areas, overlapping distributions of thermopile sclerophyllos species,water demanding Valdivian evergreen species and species growing in Nothofagus obliqua forests occur. The three main conclusions of the study are: 1) appropriatemodel selection is crucialwhenworkingwith biodiversity count data; 2) the application of RF for datawith non-symmetric error distributions is questionable; and 3) structural and topographic information derived from LiDAR data is useful for predicting local plant species richness.
AB - Biodiversity is considered to be an essential element of the Earth system, driving important ecosystem services. However, the conservation of biodiversity in a quickly changing world is a challenging task which requires costefficient and precise monitoring systems. In the present study, the suitability of airborne discrete-return LiDAR data for the mapping of vascular plant species richness within a Sub-Mediterranean second growth native forest ecosystemwas examined. The vascular plant richness of four different layers (total, tree, shrub and herb richness) was modeled using twelve LiDAR-derived variables. As species richness values are typically count data, the corresponding asymmetry and heteroscedasticity in the error distribution has to be considered. In this context, we compared the suitability of random forest (RF) and a Generalized Linear Model (GLM) with a negative binomial error distribution. Both models were coupled with a feature selection approach to identify the most relevant LiDAR predictors and keep the models parsimonious. The results of RF and GLM agreed that the three most important predictors for all four layers were altitude above sea level, standard deviation of slope and mean canopy height. This was consistent with the preconception of LiDAR's suitability for estimating species richness,which is its capacity to capture three types of information: micro-topographical, macro-topographical and canopy structural. Generalized Linear Models showed higher performances (r2: 0.66, 0.50, 0.52, 0.50; nRMSE: 16.29%, 19.08%, 17.89%, 21.31% for total, tree, shrub and herb richness respectively) than RF (r2: 0.55, 0.33, 0.45, 0.46; nRMSE: 18.30%, 21.90%, 18.95%, 21.00% for total, tree, shrub and herb richness, respectively). Furthermore, the results of the best GLMwere more parsimonious (three predictors) and less biased than the best RFmodels (twelve predictors). We think that this is due to the mentioned non-symmetric error distribution of the species richness values, which RF is unable to properly capture. Froman ecological perspective, the predicted patterns agreedwell with the known vegetation composition of the area. We found especially high species numbers at low elevations and along riversides. In these areas, overlapping distributions of thermopile sclerophyllos species,water demanding Valdivian evergreen species and species growing in Nothofagus obliqua forests occur. The three main conclusions of the study are: 1) appropriatemodel selection is crucialwhenworkingwith biodiversity count data; 2) the application of RF for datawith non-symmetric error distributions is questionable; and 3) structural and topographic information derived from LiDAR data is useful for predicting local plant species richness.
KW - Alpha-diversity
KW - Bootstrap validation
KW - GLM
KW - LiDAR data
KW - Random forest
KW - Species richness
UR - http://www.scopus.com/inward/record.url?scp=84949309661&partnerID=8YFLogxK
U2 - 10.1016/j.rse.2015.11.029
DO - 10.1016/j.rse.2015.11.029
M3 - Article
AN - SCOPUS:84949309661
SN - 0034-4257
VL - 173
SP - 200
EP - 210
JO - Remote Sensing of Environment
JF - Remote Sensing of Environment
ER -