Email Alert | RSS    帮助

中国防痨杂志 ›› 2025, Vol. 47 ›› Issue (8): 1053-1061.doi: 10.19982/j.issn.1000-6621.20250033

• 论著 • 上一篇    下一篇

基于机器学习算法的诊断模型对结核性胸腔积液的应用价值

焦家欢1,2, 孙长峰2,3, 吴刚2, 黄富礼2, 盛云建2()   

  1. 1乐山市人民医院感染性疾病科,乐山614000
    2西南医科大学附属医院感染科,泸州646000
    3西南医科大学附属医院感染与免疫实验室,泸州646000
  • 收稿日期:2025-01-21 出版日期:2025-08-10 发布日期:2025-08-01
  • 通信作者: 盛云建,Email: sheng200410@163.com
  • 基金资助:
    四川省感染性疾病科临床重点专科建设项目(川卫医政函〔2024〕116号)

The value of machine learning algorithm-based diagnostic models in tuberculous pleural effusion

Jiao Jiahuan1,2, Sun Changfeng2,3, Wu Gang2, Huang Fuli2, Sheng Yunjian2()   

  1. 1Department of Infectious Diseases, The People’s Hospital of Leshan, Leshan 614000, China
    2Department of Infectious Diseases, The Affiliated Hospital of Southwest Medical University, Luzhou 646000, China
    3Infection and Immunity Laboratory, The Affiliated Hospital of Southwest Medical University, Luzhou 646000, China
  • Received:2025-01-21 Online:2025-08-10 Published:2025-08-01
  • Contact: Sheng Yunjian, Email: sheng200410@163.com
  • Supported by:
    Sichuan Provincial Key Clinical Specialty Construction Project for Infectious Diseases(川卫医政函〔2024〕116号)

摘要:

目的: 探索基于人工智能机器学习算法(machine learning algorithm,MLA)构建的诊断模型在结核性胸腔积液(tuberculous pleural effusion,TPE)诊断中的价值。方法: 采用回顾性研究方法,参照入组标准纳入2020年1月至2022年9月四川省乐山市人民医院收治的233例胸腔积液患者作为内部实验组,按是否诊断为TPE,将患者分为结核组(106例)和非结核组(127例)。运用R 4.1.1软件进行数据整理和统计分析,利用LASSO回归筛选变量,并依此分别构建随机森林(RF)、支持向量机-线性核(SVM-linear)、支持向量机-多项式核(SVM-polynomial)、多因素logistic回归等4种MLA开发诊断模型。通过受试者工作曲线下面积(AUC)评估不同模型的诊断性能,并与胸腔积液腺苷脱氨酶(ADA)的诊断效能进行比较。另纳入同期西南医科大学附属医院的141例胸腔积液患者(结核组101例,非结核组40例)进行外部验证。结果: LASSO回归分析显示,胸腔积液总蛋白、ADA、单核细胞占比,以及血清中性粒细胞占比、血小板计数、发热及盗汗均是发生TPE的危险因素(惩罚系数分别为0.216、0.058、0.003、0.049、0.000、0.045、1.605),而胸腔积液癌胚抗原(CEA)、多核细胞占比,以及外周血白细胞计数均与TPE出现概率更低相关(惩罚系数分别为-0.072、-0.029、-0.567)。构建的RF、SVM-linear、SVM-polynomial和多因素logistic回归分析等4种MLA开发的诊断模型对TPE的诊断敏感度分别为91.8%、84.5%、86.9%、85.4%;特异度分别为99.0%、81.6%、93.8%、81.6%;AUC值分别为0.988、0.875、0.959和0.886,均高于胸腔积液ADA(分别为83.1%、77.9%、0.820)。在外部验证中,RF、SVM-linear、SVM-polynomial和多因素logistic回归模型的AUC值分别为0.834、0.827、0817和0.815。结论: 基于RF算法构建的TPE诊断模型具有最优质的诊断性能,可以更加简单、快速、有效地识别TPE。

关键词: 结核,胸膜, 胸腔积液, 诊断,计算机辅助, 模型,统计学, 算法, 人工智能

Abstract:

Objective: To explore the value of artificial intelligence Machine Learning Algorithm (MLA) based diagnostic models for tuberculous pleural effusion (TPE) diagnosis. Methods: A retrospective study was conducted. All of 233 patients with pleural effusion admitted to The People’s Hospital of Leshan from January 2020 to September 2022 were enrolled as an internal experimental group according to inclusion criteria. Patients were categorized into tuberculosis group (n=106) and non-tuberculosis group (n=127) based on TPE diagnosis. Clinical data were processed and analyzed using R software (version 4.1.1). Least absolute shrinkage and selection operator (LASSO) regression were employed for variable selection, followed by the development of four MLA-based diagnostic models: random forest (RF), support vector machine with linear kernel (SVM-linear), support vector machine with polynomial kernel (SVM-polynomial), and multivariate logistic regression. The diagnostic performance of each model was evaluated using the area under the receiver operating characteristics curve (AUC), and compared with the pleural adenosine deaminase (ADA). External validation was conducted using an independent cohort of 141 pleural effusion patients (101 with TPE and 40 without TPE) from The Affiliated Hospital of Southwest Medical University during the same period. Results: LASSO regression analysis identified total pleural protein, pleural ADA, mononuclear cell ratio in pleural fluid, serum neutrophil ratio, platelet count, fever, and night sweats as risk factors for TPE (penalty coefficients: 0.216, 0.058, 0.003, 0.049, 0.000, 0.045, 1.605, respectively), whereas pleural carcinoembryonic antigen (CEA), polymorphonuclear cell ratio in pleural fluid, and peripheral white blood cell count were associated with a lower risk of TPE (penalty coefficients: -0.072, -0.029, -0.567, respectively). The four MLA-based diagnostic models demonstrated TPE diagnostic sensitivities of 91.8% (RF), 84.5% (SVM-linear), 86.9% (SVM-polynomial), and 85.4% (multivariate logistic regression); specificities of 99.0%, 81.6%, 93.8%, and 81.6%; and AUC values of 0.988, 0.875, 0.959, and 0.886, respectively, all exceeding pleural effusion ADA performance (sensitivity 83.1%, specificity 77.9%, AUC 0.820). In the external validation cohort, the AUCs of the RF, SVM-linear, SVM-polynomial, and logistic regression models were 0.834, 0.827, 0.817, and 0.815, respectively. Conclusion: The novel random forest based diagnostic model demonstrated the best diagnostic performance for TPE identification, providing a simpler, more rapid, and clinical effective diagnostic approach.

Key words: Tuberculosis, pleural, Pleural effusion, Diagnosis, computer-assisted, Models, statistical, Algorithms, Artificial intelligence algorithms

中图分类号: