Email Alert | RSS    帮助

中国防痨杂志 ›› 2024, Vol. 46 ›› Issue (1): 92-99.doi: 10.19982/j.issn.1000-6621.20230273

• 论著 • 上一篇    下一篇

基于转录组学和机器学习算法的肺结核铁死亡相关关键基因的研究

叶江娥1, 方雪晖2, 熊延军3, 刘盛盛4()   

  1. 1安徽省胸科医院结核一科,合肥 230022
    2安徽省胸科医院办公室,合肥 230022
    3安徽省胸科医院结核八科,合肥 230022
    4安徽省胸科医院结核七科,合肥 230022
  • 收稿日期:2023-08-07 出版日期:2024-01-10 发布日期:2024-01-04
  • 通信作者: 刘盛盛,Email:627905818@qq.com
  • 基金资助:
    2022年安徽省自然科学基金面上项目(2208085MH193)

Transcriptomics and machine learning algorithm-based characterization of ferroptosis-related genes in tuberculosis

Ye Jiang’e1, Fang Xuehui2, Xiong Yanjun3, Liu Shengsheng4()   

  1. 1Tuberculosis Ward 1, Anhui Chest Hospital,Hefei 230022,China
    2Administration Office, Anhui Chest Hospital, Hefei 230022, China
    3Tuberculosis Ward 8, Anhui Chest Hospital, Hefei 230022, China
    4Tuberculosis Ward 7, Anhui Chest Hospital, Hefei 230022, China
  • Received:2023-08-07 Online:2024-01-10 Published:2024-01-04
  • Contact: Liu Shengsheng, Email: 627905818@qq.com
  • Supported by:
    2022 Anhui Provincial Natural Science Foundation(2208085MH193)

摘要:

目的:运用转录组学和机器学习方法探索铁死亡关键基因与肺结核发病机制的相关性。方法:以“pulmonary tuberculosis”为关键词,以测序类型(转录组学)和物种(HOMO sapiens)等为条件从公共存储库NCBI GEO(http://www.ncbi.nlm.nih.gov/geo)中进行搜索和筛选,得到GSE153326和GSE67589两套转录组数据集。将GSE153326作为训练组数据[包含健康人群血液样本8份,结核分枝杆菌(Mycobacterium tuberculosis, MTB)阳性血液样本52份]、GSE67589作为验证组数据(包含健康人群血液样本30份,MTB阳性27份);使用R脚本对数据进行矫正和注释,对两转录组的差异表达基因鉴定后,筛选出GSE153326训练组差异基因中与铁死亡相关基因(TBFerDEG)的表达情况,并对TBFerDEG的基因本体(GO)及京都基因和基因组百科全书(KEGG)进行富集分析;利用LASSO回归分析和SVM算法,从TBFerDEG中获得与铁死亡有明显差异的关键基因,并对其进行ROC分析,探索与关键基因相关的药物调控网络;最后将关键基因导入GSE67589验证组中,以验证训练组筛选出的铁死亡关键基因的诊断意义。结果:通过生物信息学分析,共筛选得到416个TBFerDEG,剔除不符合入选要求的基因,最终得到56个TBFerDEG差异基因。GO富集分析发现,肺结核中与铁死亡相关的生物学进程有细胞对化学应激的反应自噬调节、线粒体自噬、线粒体解体等;参与的通路有AMPK信号通路和铁死亡等。通过LASSO回归分析和SVM算法,最终得到5个与铁死亡相关的关键基因,分别为BIDARSTK11ALOX12SRC,AUC分别为0.807、0.858、0.734、0.840和0.880;验证关键基因结果表明,ARSRC基因在验证组(GSE67589)中MTB阳性与健康人群中的表达差异均有统计学意义(P值分别为0.004和0.017)。结论:ARSRC是肺结核中与铁死亡相关的关键基因,可能为今后的基础研究奠定一定的借鉴和参考。

关键词: 结核,肺, 铁死亡, 基因表达, 机器学习

Abstract:

Objective: To investigate the correlation between key genes associated with ferroptosis and the pathogenic mechanism of pulmonary tuberculosis using transcriptomics and machine learning methodologies. Methods: Two transcriptomic datasets were obtained, named as GSE153326 and GSE67589, through searching the NCBI GEO public repository (http://www.ncbi.nlm.nih.gov/geo) using keywords “pulmonary tuberculosis” and specific criteria such as sequencing type (transcriptomics) and species (HOMO sapiens). GSE153326 served as a training dataset, comprising eight blood samples from healthy individuals and 52 blood samples with positive Mycobacterium tuberculosis (MTB). GSE67589 served as the validation dataset, including 30 blood samples from healthy individuals and 27 MTB-positive samples. Data refinement and annotation were performed using R scripts. After identification of differentially expressed genes in the two transcriptomic datasets, gene expression related to ferroptosis (TBFerDEG) in the training dataset GSE153326 were obtained. Enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for TBFerDEG was conducted. LASSO regression analysis and the SVM algorithm were applied to extract key genes associated with ferroptosis from TBFerDEG. ROC analysis was performed to explore the drug regulatory network related to these key genes. Finally, the key genes were incorporated into the validation dataset GSE67589 to validate the diagnostic performance of the ferroptosis-associated key genes identified in the training dataset. Results: Through bioinformatics analysis, a total of 416 TBFerDEGs were identified and 56 differentially expressed genes were obtained after filtering of non-matched genes. GO enrichment analysis revealed that biological processes related to ferroptosis in pulmonary tuberculosis include cellular response to chemical stress by autophagy regulation, mitochondrial autophagy, and mitochondrial disintegration. The implicated pathways encompassed the AMPK signaling pathway and ferroptosis. Through LASSO regression analysis and the SVM algorithm, five key genes associated with ferroptosis were ultimately identified, BID, AR, STK11, ALOX12, and SRC, with AUCs of 0.807, 0.858, 0.734, 0.840, and 0.880, respectively. Expression of AR (P=0.004) and SRC (P=0.017) in MTB-positive group were significantly different compared to the control group in the validation dataset. Conclusion: AR and SRC are key genes associated with ferroptosis in pulmonary tuberculosis, providing valuable insights for future basic research in this field.

Key words: Tuberculosis, pulmonary, Ferroptosis, Gene expression, Machine learning

中图分类号: