ISSN 1003-8035 CN 11-2852/P

    可解释机器学习驱动的怒江中游滑坡易发性评价

    Landslide susceptibility assessment in the middle reaches of the Nujiang River driven by explainable machine learning

    • 摘要: 怒江中游流域地质构造复杂,滑坡灾害频发,对该区域滑坡进行易发性评价可有效识别滑坡高易发区域,极大提高怒江中游流域的防灾减灾效率。文章基于历史数据、遥感解译和现场勘察,获取3358处中至大型滑坡灾害数据(滑坡体积>105 m3),构建怒江中游流域滑坡灾害数据库。结合方差膨胀因子(variance inflation factor,VIF)和容忍度筛选出地形地貌、基础地质、水文地质、环境影响和外界触发因子等12个特征条件因子,以研究区内南侧滑坡相对密集区的滑坡样本作为训练集,将研究区其余滑坡作为测试集(训练集∶测试集≈1∶1),采用随机森林(random forest,RF)、朴素贝叶斯(naïve Bayes,NB)、优化梯度提升树(extreme gradient boosting,XGBoost)对整个研究区的滑坡灾害易发性情况进行分析预测,并分析评价模型的跨地区泛化能力。结果表明:滑坡的极高易发区和高易发区主要集中于怒江及其支流的河谷地区,受断裂、地表切割破碎和水系发育等因素影响,与研究区内滑坡的分布情况基本吻合。滑坡易发性评价表明RF模型精度最高(AUC=0.880),其次是NB(AUC=0.862)、XGBoost(AUC=0.853),并且RF模型的滑坡易发性制图具有更高的准确度(86.5%)和可靠性(kappa=0.730);SHAP解释认为高程因子在RF、NB和XGBoost模型中对滑坡易发性评价的重要性最大。RF、NB和XGBoost模型均具有较高的跨地区泛化能力,但RF模型AUC值最高,能更适用于地形高差大、地质环境复杂区域的滑坡易发性评价。

       

      Abstract: Landslides occur frequently in the middle reaches of the Nujiang River due to its complex tectonic and geological environment. Landslide susceptibility assessment can effectively identify areas with high landslide susceptibility, thereby significantly improving disaster prevention and mitigation efficiency in this region. Based on historical data, remoting sensing interpretation, and filed investigations, a total of 3358 medium- to large-scale landslides (volume >105 m3) were identified to construct a landslide inventory for the middle reaches of the Nujiang River. Twelve conditioning factors, including topography, basic geology, hydrology, environmental influences, and external triggering factors, were selected using variance inflation factor (VIF) and tolerance analysis. The landslide samples from the southern part of the study area, where landslides are relatively concentrated, were used as the training set, while those from the remaining regions served as the test set, achieving an approximately 1∶1 ratio between training and test sets. This spatial partitioning strategy was employed to evaluate the cross-regional generalization ability of machine learning models. Random Forest (RF), Naive Bayes (NB), and eXtreme Gradient Boosting (XGBoost) models were applied to predict landslide susceptibility across the entire study area. The results indicate that very high and high susceptibility zones are primarily concentrated in the valleys of the Nujiang River and its tributaries, influenced by faults, intense topographic rockmass, and well-developed drainage networks. These patterns are generally consistent with the actual spatial distribution of landslides in the study area. Among the three models, RF model achieved highest precision (AUC = 0.880), followed by NB (AUC = 0.862), and XGBoost (AUC = 0.853). Furthermore, the landslide susceptibility map generated by the RF model demonstrates higher accuracy (86.5%) and reliability (kappa = 0.730). SHAP (SHapley Additive exPlanations) interpretation reveals that elevation is the most important factor influencing landslide susceptibility in all three models (RF, NB, and XGBoost). The results indicates that the RF, NB, and XGBoost models all exhibit strong cross-regional generalization capabilities. However, the RF model achieves the highest AUC value and is therefore more suitable for landslide susceptibility assessment in areas characterized by large elevation gradients and complex geological environments.

       

    /

    返回文章
    返回