Landslide susceptibility assessment in the middle reaches of the Nujiang River Driven by Explainable Machine Learning
-
-
Abstract
Landslides occur frequently in the middle reaches of the Nujiang River due to its complex tectonic and geological environment. Landslide susceptibility assessment can effectively identify areas with high landslide susceptibility, thereby significantly improving disaster prevention and mitigation efficiency in this region. Based on historical data, remoting sensing interpretation, and filed investigations, a total of 3358 medium- to large-scale landslides (volume >105 m3) were identified to construct a landslide inventory for the middle reaches of the Nujiang River. Twelve conditioning factors, including topography, basic geology, hydrology, environmental influences, and external triggering factors, were selected using variance inflation factor (VIF) and tolerance analysis. The landslide samples from the southern part of the study area, where landslides are relatively concentrated, were used as the training set, while those from the remaining regions served as the test set, achieving an approximately 1∶1 ratio between training and test sets. This spatial partitioning strategy was employed to evaluate the cross-regional generalization ability of machine learning models. Random Forest (RF), Naive Bayes (NB), and eXtreme Gradient Boosting (XGBoost) models were applied to predict landslide susceptibility across the entire study area. The results indicate that very high and high susceptibility zones are primarily concentrated in the valleys of the Nujiang River and its tributaries, influenced by faults, intense topographic rockmass, and well-developed drainage networks. These patterns are generally consistent with the actual spatial distribution of landslides in the study area. Among the three models, RF model achieved highest precision (AUC = 0.880), followed by NB (AUC = 0.862), and XGBoost (AUC = 0.853). Furthermore, the landslide susceptibility map generated by the RF model demonstrates higher accuracy (86.5%) and reliability (Kappa = 0.730). SHAP (SHapley Additive exPlanations) interpretation reveals that elevation is the most important factor influencing landslide susceptibility in all three models (RF, NB, and XGBoost). The results indicates that the RF, NB, and XGBoost models all exhibit strong cross-regional generalization capabilities. However, the RF model achieves the highest AUC value and is therefore more suitable for landslide susceptibility assessment in areas characterized by large elevation gradients and complex geological environments. The findings of this study provide a reference for landslide susceptibility evaluation in deep-cut river valleys with complex geological tectonic environments, while also providing theoretical support for landslide risk management and disaster prevention and mitigation in the middle reaches of the Nujiang River.
-
-