基于机器学习算法的诊断模型对结核性胸腔积液的应用价值

doi:10.19982/j.issn.1000-6621.20250033

摘要/Abstract

摘要：

目的: 探索基于人工智能机器学习算法(machine learning algorithm,MLA)构建的诊断模型在结核性胸腔积液(tuberculous pleural effusion,TPE)诊断中的价值。方法: 采用回顾性研究方法,参照入组标准纳入2020年1月至2022年9月四川省乐山市人民医院收治的233例胸腔积液患者作为内部实验组,按是否诊断为TPE,将患者分为结核组(106例)和非结核组(127例)。运用R 4.1.1软件进行数据整理和统计分析,利用LASSO回归筛选变量,并依此分别构建随机森林(RF)、支持向量机-线性核(SVM-linear)、支持向量机-多项式核(SVM-polynomial)、多因素logistic回归等4种MLA开发诊断模型。通过受试者工作曲线下面积(AUC)评估不同模型的诊断性能,并与胸腔积液腺苷脱氨酶(ADA)的诊断效能进行比较。另纳入同期西南医科大学附属医院的141例胸腔积液患者(结核组101例,非结核组40例)进行外部验证。结果: LASSO回归分析显示,胸腔积液总蛋白、ADA、单核细胞占比,以及血清中性粒细胞占比、血小板计数、发热及盗汗均是发生TPE的危险因素(惩罚系数分别为0.216、0.058、0.003、0.049、0.000、0.045、1.605),而胸腔积液癌胚抗原(CEA)、多核细胞占比,以及外周血白细胞计数均与TPE出现概率更低相关(惩罚系数分别为-0.072、-0.029、-0.567)。构建的RF、SVM-linear、SVM-polynomial和多因素logistic回归分析等4种MLA开发的诊断模型对TPE的诊断敏感度分别为91.8%、84.5%、86.9%、85.4%;特异度分别为99.0%、81.6%、93.8%、81.6%;AUC值分别为0.988、0.875、0.959和0.886,均高于胸腔积液ADA(分别为83.1%、77.9%、0.820)。在外部验证中,RF、SVM-linear、SVM-polynomial和多因素logistic回归模型的AUC值分别为0.834、0.827、0817和0.815。结论: 基于RF算法构建的TPE诊断模型具有最优质的诊断性能,可以更加简单、快速、有效地识别TPE。

关键词: 结核,胸膜, 胸腔积液, 诊断,计算机辅助, 模型,统计学, 算法, 人工智能

Abstract:

Objective: To explore the value of artificial intelligence Machine Learning Algorithm (MLA) based diagnostic models for tuberculous pleural effusion (TPE) diagnosis. Methods: A retrospective study was conducted. All of 233 patients with pleural effusion admitted to The People’s Hospital of Leshan from January 2020 to September 2022 were enrolled as an internal experimental group according to inclusion criteria. Patients were categorized into tuberculosis group (n=106) and non-tuberculosis group (n=127) based on TPE diagnosis. Clinical data were processed and analyzed using R software (version 4.1.1). Least absolute shrinkage and selection operator (LASSO) regression were employed for variable selection, followed by the development of four MLA-based diagnostic models: random forest (RF), support vector machine with linear kernel (SVM-linear), support vector machine with polynomial kernel (SVM-polynomial), and multivariate logistic regression. The diagnostic performance of each model was evaluated using the area under the receiver operating characteristics curve (AUC), and compared with the pleural adenosine deaminase (ADA). External validation was conducted using an independent cohort of 141 pleural effusion patients (101 with TPE and 40 without TPE) from The Affiliated Hospital of Southwest Medical University during the same period. Results: LASSO regression analysis identified total pleural protein, pleural ADA, mononuclear cell ratio in pleural fluid, serum neutrophil ratio, platelet count, fever, and night sweats as risk factors for TPE (penalty coefficients: 0.216, 0.058, 0.003, 0.049, 0.000, 0.045, 1.605, respectively), whereas pleural carcinoembryonic antigen (CEA), polymorphonuclear cell ratio in pleural fluid, and peripheral white blood cell count were associated with a lower risk of TPE (penalty coefficients: -0.072, -0.029, -0.567, respectively). The four MLA-based diagnostic models demonstrated TPE diagnostic sensitivities of 91.8% (RF), 84.5% (SVM-linear), 86.9% (SVM-polynomial), and 85.4% (multivariate logistic regression); specificities of 99.0%, 81.6%, 93.8%, and 81.6%; and AUC values of 0.988, 0.875, 0.959, and 0.886, respectively, all exceeding pleural effusion ADA performance (sensitivity 83.1%, specificity 77.9%, AUC 0.820). In the external validation cohort, the AUCs of the RF, SVM-linear, SVM-polynomial, and logistic regression models were 0.834, 0.827, 0.817, and 0.815, respectively. Conclusion: The novel random forest based diagnostic model demonstrated the best diagnostic performance for TPE identification, providing a simpler, more rapid, and clinical effective diagnostic approach.

Key words: Tuberculosis, pleural, Pleural effusion, Diagnosis, computer-assisted, Models, statistical, Algorithms, Artificial intelligence algorithms

中图分类号:

焦家欢, 孙长峰, 吴刚, 黄富礼, 盛云建. 基于机器学习算法的诊断模型对结核性胸腔积液的应用价值[J]. 中国防痨杂志, 2025, 47(8): 1053-1061. doi: 10.19982/j.issn.1000-6621.20250033

Jiao Jiahuan, Sun Changfeng, Wu Gang, Huang Fuli, Sheng Yunjian. The value of machine learning algorithm-based diagnostic models in tuberculous pleural effusion[J]. Chinese Journal of Antituberculosis, 2025, 47(8): 1053-1061. doi: 10.19982/j.issn.1000-6621.20250033

图/表 7

表1

内部实验组中结核组与非结核组患者基本信息及生化指标情况

资料	合计(233例)		结核组(106例)			非结核组(127例)		统计检验值		P值
年龄(岁)	54.71±21.15		40.23±20.31			66.80±12.48		t=11.746		<0.001
性别								χ²=0.280		0.596
男性	163(70.0)		76(71.7)			87(68.5)
女性	70(30.0)		30(28.3)			40(31.5)
发热								χ²=5.933		0.015
是	62(26.6)		38(35.8)			24(18.9)
否	171(73.4)		68(64.2)			103(81.1)
咳嗽								χ²=0.048		0.825
是	173(74.2)		83(78.3)			90(70.9)
否	60(25.8)		23(21.7)			37(29.1)
气促								χ²=0.113		0.737
是	130(55.8)		64(60.4)			66(52.0)
否	103(44.2)		42(39.6)			61(48.0)
盗汗								χ²=14.823		<0.001
是	13(5.6)		13(12.3)			0(0.0)
否	220(94.4)		93(87.7)			127(100.0)
胸腔积液
总蛋白(g/L)	45.56±9.08		50.29±5.79			41.59±9.45		t=-8.591		<0.001
腺苷脱氨酶(U/L)	31.50±24.90		46.82±17.87			18.61±22.61		t=-10.397		<0.001
乳酸脱氢酶(U/L)	796.26±1435.88		650.35±1450.27			919.01±1417.79		t=1.420		0.157
癌抗原125(u/ml)	391.95(217.03,600.00)		357.00(139.25,591.88)			429.50(281.50,600.00)		Z=2.569		0.021
癌胚抗原(ng/ml)	1.17(0.50,3.21)		0.51(0.50,1.21)			2.37(0.93,90.98)		Z=7.729		<0.001
细胞数(×10⁹/L)	1.59(0.70,3.24)		2.19(1.22,3.50)			0.99(0.47,2.22)		Z=1.582		<0.001
多核比	0.11(0.05,0.42)		0.06(0.03,0.10)			0.283(0.09,0.68)		Z=8.679		<0.001
单核比	0.89(0.58,0.95)		0.94(0.89,0.97)			0.73(0.32,0.91)		Z=-8.619		<0.001
血液指标
癌胚抗原(ng/ml)	1.83(0.97,3.63)		1.10(0.73,1.98)			2.09(1.21,14.52)		Z=5.361		<0.001
癌抗原125(u/ml)	74.00(43.70,134.90)		79.75(46.28,133.70)			71.50(43.35,129.75)		Z=0.016		0.531
白细胞计数(×10⁹/L)	8.20±4.08		6.17±1.87			9.87±4.62		t=8.229		<0.001
中性粒细胞计数(×10⁹/L)	6.25±3.86		4.26±1.60			7.87±4.39		t=8.580		<0.001
中性比	0.73±0.11		0.68±0.11			0.78±0.10		t=6.839		<0.001
淋巴细胞计数(×10⁹/L)	1.45±3.68		1.67±5.32			1.26±1.23		t=-0.849		0.397
淋巴比	0.16±0.09		0.19±0.09			0.14±0.08		t=-4.625		<0.001
单核细胞计数(×10⁹/L)	0.70±0.77		0.59±0.27			0.78±1.01		t==2.063		0.041
单核比	0.08±0.03		0.09±0.03			0.07±0.02		t=-7.030		<0.001
血红蛋白(g/L)	124.59±19.36		124.81±18.23			124.42±20.32		t=-0.152		0.879
血小板计数(×10⁹/L)	285.24±119.75		328.58±104.88			249.75±119.85		t=-5.258		<0.001
总蛋白(g/L)	65.28±8.18		66.48±7.98			64.30±8.24		t=-2.031		0.043
谷丙转氨酶(U/L)		31.21±31.77		31.82±28.40	30.72±34.39		t=-0.261		0.794
谷草转氨酶(U/L)		31.43±26.15		30.90±19.73	31.87±30.53		t=0.278		0.781
肌酸激酶(U/L)		58.72±44.11		58.21±41.31	59.18±46.74		t=0.132		0.895
肌酸激酶同工酶(U/L)		12.92±13.31		10.88±8.30	14.72±16.37		t=1.801		0.074
凝血酶原时间(s)		13.20±1.47		13.25±1.15	13.15±1.69		t=-0.528		0.598
INR		1.11±0.67		1.17±0.97	1.06±0.13		t=-1.182		0.239
γ-干扰素释放试验							χ²=99.872		<0.001
阳性		121(51.9)		93(87.7)	28(22.0)
阴性		112(48.1)		13(12.3)	99(78.0)

表1

表2

表3

图1

表4

表5

图2

参考文献 28

[1]	陈伟, 李雪, 刘小秋, 等. 《全国结核病防治规划(2024—2030年)》解读. 中国防痨杂志, 2025, 47(2):130-136. doi:10.19982/j.issn.1000-6621.20240585.
[2]	Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res, 2015, 43(7):e47. doi:10.1093/nar/gkv007.
[3]	李根, 陶学萍, 钟希, 等. T-SPOT.TB联合胸水CD64、S100A9、ADA检测诊断结核性胸腔积液的临床价值. 中华医院感染学杂志, 2025, 35(1):25-29. doi:10.11816/cn.ni.2025-240480.
[4]	Ou Y, Li D, Long X, et al. Study on the early diagnostic value of nanopore sequencing in alveolar lavage fluid smear-negative pulmonary tuberculosis. Braz J Microbiol, 2025, 56(1):365-372. doi:10.1007/s42770-024-01575-9. pmid: 39621292
[5]	Fernández-Delgado M, Cernadas E, Barro S, et al. Do we need hundreds of classifiers to solve real world classification problems?. J Mach Learn Res, 2014, 15(1):3133-3181. doi:10.5555/2627435.2697065.
[6]	中华人民共和国国家卫生和计划生育委员会. WS 288—2017肺结核诊断. 结核与肺部疾病杂志, 2024, 5(4):376-378. doi:10.19983/j.issn.2096-8493.2024022.
[7]	中华医学会呼吸病学分会. 恶性胸腔积液治疗的中国专家共识(2023年版). 中华结核和呼吸杂志, 2023, 46(12):1189-1203. doi:10.3760/cma.j.cn112147-20230902-00126.
[8]	中华医学会呼吸病学分会胸膜与纵隔疾病学组(筹). 胸腔积液诊断的中国专家共识. 中华结核和呼吸杂志, 2022, 45(11):1080-1096. doi:10.3760/cma.j.cn112147-20220511-00403.
[9]	刘波, 齐凤娥, 张建华, 等. T细胞酶联免疫斑点试验在结核性胸膜炎鉴别诊断中的应用价值. 实用医学杂志, 2015, 31(21):3592-3594. doi:10.3969/j.issn.1006-5725.2015.21.042.
[10]	郑春燕, 李海, 杨莉, 等. 酶联免疫斑点技术在快速诊断结核性胸膜炎中的应用. 广东医学, 2013, 34(16):2503-2506. doi:10.3969/j.issn.1001-9448.2013.16.028.
[11]	罗红辉, 刘育良, 罗忆群. T-SPOT.TB联合胸水ADA在结核性胸膜炎中的诊断价值. 广州医药, 2021, 52(1):109-111,129. doi:10.3969/j.issn.1000-8535.2021.01.022.
[12]	文玉琪, 李志惠, 任欣欣, 等. 胸腔积液结核感染T细胞斑点检测与腺苷脱氨酶对结核性胸膜炎的诊断价值. 中国医刊, 2021, 56(7):732-735. doi:10.3969/j.issn.1008-1070.2021.07.011.
[13]	陈德洪. 内科胸腔镜下胸膜组织Xpert MTB/RIF对诊断结核性胸膜炎的价值. 广州:广州医科大学, 2023.
[14]	吴一秀. 不同指标联合检测对结核性胸膜炎的诊断价值的分析. 湛江:广东医科大学, 2023.
[15]	Feller-Kopman D, Light R. Pleural Disease. N Engl J Med, 2018, 378(8):740-751. doi:10.1056/NEJMra1403503.
[16]	李芳, 张坚, 郝雪琦, 等. 胸腔积液中IL-27和IFN-γ检测对结核性胸膜炎的诊断价值. 吉林大学学报(医学版), 2019, 45(2):353-358. doi:10.13481/j.1671-587x.20190224.
[17]	Kim NY, Jang B, Gu KM, et al. Differential Diagnosis of Pleural Effusion Using Machine Learning. Ann Am Thorac Soc, 2024, 21(2):211-217. doi:10.1513/AnnalsATS.202305-410OC.
[18]	欧阳佩珩, 胡志德, 张蕾. 新型结核性胸腔积液诊断标志物研究进展. 中国感染与化疗杂志, 2024, 24(3):371-376. doi:10.16718/j.1009-7708.2024.03.020.
[19]	Dhooria S, Singh N, Aggarwal AN, et al. A randomized trial comparing the diagnostic yield of rigid and semirigid thoracoscopy in undiagnosed pleural effusions. Respir Care, 2014, 59(5):756-764. doi:10.4187/respcare.02738. pmid: 24106326
[20]	张娟, 李海芬, 李小曼, 等. 糖尿病足溃疡复发风险预测模型的构建:基于Logistic回归和支持向量机及BP神经网络模型. 中国全科医学, 2023, 26(32):4013-4019. doi:10.12114/j.issn.1007-9572.2023.0175.
[21]	Aggarwal AN, Agarwal R, Dhooria S, et al. Unstimulated Pleural Fluid Interferon Gamma for Diagnosis of Tuberculous Pleural Effusion: a Systematic Review and Meta-analysis. J Clin Microbiol, 2021, 59(5):e02112-20. doi:10.1128/jcm.02112-20.
[22]	Ai L, Li J, Ye T, et al. Use of Platelet Parameters in the Differential Diagnosis of Lung Adenocarcinoma-Associated Malignant Pleural Effusion and Tuberculous Pleural Effusion. Dis Markers, 2022, 2022:5653033. doi:10.1155/2022/5653033.
[23]	李锐成, 郜赵伟, 董轲, 等. 胸腔积液与血清中的癌胚抗原及其比值对结核性与肺癌性胸腔积液的诊断价值. 南方医科大学学报, 2019, 39(2):175-180. doi 10.12122/j.issn.1673-4254.2019.02.08.
[24]	Neves DD, Dias RM, Cunha AJ. Predictive model for the diagnosis of tuberculous pleural effusion. Braz J Infect Dis, 2007, 11(1): 83-88. doi:10.1590/s1413-86702007000100019. pmid: 17625733
[25]	Ruan SY, Chuang YC, Wang JY, et al. Revisiting tuberculous pleurisy: pleural fluid characteristics and diagnostic yield of mycobacterial culture in an endemic area. Thorax, 2012, 67(9):822-827. doi:10.1136/thoraxjnl-2011-201363.
[26]	Adam SP, Alexandropoulos S-AN, Pardalos PM, et al. No Free Lunch Theorem: A Review. Berlin: Approximation and Optimization: Algorithms, Complexity and Applications, 2019: 57-82. doi:10.1007/978-3-030-12767-15.
[27]	Li F, Zhu C, Zhang Y, et al. Granzyme A as biomarker for diagnosis in tuberculous pleural effusion. JCI Insight, 2024, 9(23): e185307. doi:10.1172/jci.insight.185307.
[28]	Zhou X, Chen Y, Gui W, et al. Enhanced differential evolution algorithm for feature selection in tuberculous pleural effusion clinical characteristics analysis. Artif Intell Med, 2024, 153:102886. doi:10.1016/j.artmed.2024.102886.

指标	外部验证组					内部实验组	统计检验值	P值
指标	结核组(101例)	非结核组(40例)	统计检验值	P值	合计(141例)	合计(233例)	统计检验值	P值
胸腔积液
总蛋白(g/L)	51.78±7.74	47.46±7.21	t=1.925	0.049	50.56±7.78	45.56±9.08	t=-5.296	<0.001
腺苷脱氨酶(U/L)	43.74±15.67	11.43±20.54	t=6.396	<0.001	34.67±22.42	31.50±24.90	t=-1.960	0.032
癌胚抗原(ng/ml)	2.10(1.35,3.38)	195.79(6.96,200)	Z=-3.130	0.028	2.86(1.43,8.67)	1.17(0.50,3.21)	Z=-5.331	<0.001
多核比	0.04(0.02,0.15)	0.18(0.07,0.33)	Z=-2.495	0.013	0.07(0.02,0.23)	0.11(0.05,0.42)	Z=2.823	0.005
单核比	0.96(0.85,0.98)	0.81(0.66,0.93)	Z=2.495	0.013	0.93(0.77,0.98)	0.89(0.58,0.95)	Z=-2.940	0.003
白细胞计数(×10⁹/L)	6.55±1.78	8.51±2.77	t=-2.625	0.016	7.10±2.26	8.20±4.08	t=2.556	0.062
中性比	0.77±0.08	0.77±0.08	t=-2.486	0.016	0.73±0.08	0.73±0.11	t=0.521	0.471
血小板计数(×10⁹/L)	334.80±102.82	300.13±123.11	t=1.082	0.028	325.07±108.90	285.24±119.75	t=-3.032	<0.001
发热	28(27.7)	10(40.0)	χ²=1.406	0.036	38(27.0)	62(26.6)	χ²=0.612	0.434
盗汗	18(17.8)	0(0.0)	χ²=0.886	0.046	18(12.7)	13(5.6)	χ²=0.977	0.323

模型	敏感度(%)	特异度(%)	阴性预测值(%)	阳性预测值(%)	AUC值
随机森林	73.0	90.9	87.7	79.2	0.834
支持向量机-线性	84.6	81.8	91.8	68.7	0.827
支持向量机-多项式核	43.9	73.1	87.0	70.3	0.817
logistic回归	61.5	94.5	83.8	84.2	0.815