Email Alert | RSS    帮助

中国防痨杂志 ›› 2025, Vol. 47 ›› Issue (6): 769-778.doi: 10.19982/j.issn.1000-6621.20240563

• 论著 • 上一篇    下一篇

基于GEO数据库筛选结核病关键基因及信号通路的研究

石洁, 常文静, 郑丹薇, 苏茹月, 马晓光, 朱岩昆, 王少华, 孙建伟, 孙定勇()   

  1. 河南省疾病预防控制中心结核病参比实验室,郑州 450016
  • 收稿日期:2024-12-13 出版日期:2025-06-10 发布日期:2025-06-11
  • 通信作者: 孙定勇,Email:sundy2222@126.com
  • 基金资助:
    河南省自然科学基金项目(232300420290)

Screening of core genes and pathways involved in tuberculosis onset based on GEO database

Shi Jie, Chang Wenjing, Zheng Danwei, Su Ruyue, Ma Xiaoguang, Zhu Yankun, Wang Shaohua, Sun Jianwei, Sun Dingyong()   

  1. The Laboratory of Reference, Henan Province Center for Disease Control and Prevention, Zhengzhou 450016, China
  • Received:2024-12-13 Online:2025-06-10 Published:2025-06-11
  • Contact: Sun Dingyong, Email: sundy2222@126.com
  • Supported by:
    Natural Science Foundation of Henan Province(232300420290)

摘要:

目的: 利用生物信息学方法鉴定结核病表达差异基因及相关信号通路,以发现可用于结核病诊断的生物标志物。方法: 从高通量基因表达数据库(GEO)中搜索结核病患者样本及健康人群的基因表达芯片数据集,下载GSE139825基因芯片微阵列数据集作为分析数据集,使用R语言中的limma包对测序数据进行标准化校正和鉴定差异基因(DEGs),使用clusterProfiler包进行基因本体论(GO)及京都基因和基因组百科全书(KEGG)信号通路分析。使用STRING在线数据库进行差异基因的蛋白互作网络(PPI)分析并用Cytoscape软件进行可视化和筛选核心基因。下载GSE19439基因芯片微阵列数据集作为表达差异的核心基因的验证数据集,同时使用酶联免疫吸附试验验证候选生物标记物,并使用受试者工作特征曲线下面积(AUC)评估其诊断能力。结果: 通过分析GSE139825数据库共筛选出206个差异基因,其中172个基因表达上调,34个基因表达下调,其中,下调50%以上的基因有PDK4CABLES1,上调8倍以上的有IL1BLOC728835CXCL10IL8。GO和KEGG分析表明,差异基因的生物过程主要集中在细胞因子介导的信号通路、白细胞细胞间黏附、对脂多糖的应答反应等方面,主要发挥细胞因子受体结合、细胞因子的活性等分子功能,并在细胞因子之间的相互作用、TNF信号通路、结核病相关通路等信号通路上富集显著。PPI分析鉴定出10个核心基因,分别为IL1BTNFIL6IL1ACCL20CXCL1CXCL10CXCL8CCL3CCR7。通过GSE19439验证数据集分析,发现10个核心基因中CXCL10IL1B同样表达上调;酶联免疫吸附实验验证也发现健康对照和结核病患者的CXCL10蛋白的ELISA平均值分别为0.570和0.827,IL1B蛋白分别为1.245和2.067,差异均有统计学意义(t=25.353,P<0.001;t=11.840,P=0.002);logistic回归模型分析显示,CXCL10和IL1B在区分健康组和结核病组方面均表现良好(AUCCXCL10=0.854,AUCIL1B=0.818)。结论: 研究揭示了结核病发病相关基因间的相关作用,发现CXCL10IL1B均能较好的区分健康对照和结核病患者,可作为新型结核病诊断的生物标志物。

关键词: 结核, 基因组文库, 数据挖掘, 表达基因, 生物学标记

Abstract:

Objective: To identify the differentially expressed genes and pathways involved in tuberculosis onset, and to find potential biomarkers that can be used to diagnose tuberculosis using bioinformatics analysis. Methods: The series microarray dataset of GSE139825 was downloaded from the Gene Expression Omnibus (GEO) database, and the limma package of R software was applied to normalize and identify the differentially expressed genes (DEGs). Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on DEGs were performed using clusterProfiler package. Protein-protein interaction (PPI) networks of DEGs were established with STRING online tool and core genes were visualized and screened by Cytoscape software. GSE19439 dataset was used to verify the differential expression of core genes. The enzyme-linked immunosorbent assay (ELISA) was used to validate candidate biomarkers, and area under curve (AUC) of receiver operating characteristic (ROC) was used to assess diagnosing abilities of candidate biomarkers. Results: Through analyzing GSE139825 dataset, a total of 206 DEGs were identified, including 172 upregulated genes and 34 downregulated genes. Among the downregulated genes, PDK4 and CABLES1 showed more than a 50% decrease, while IL1B, LOC728835, CXCL10, and IL8 exhibited more than an 8-fold increase. GO and KEGG pathway analyses indicated that the biological processes of the DEGs were primarily associated with cytokine-mediated signaling pathways, leukocyte intercellular adhesion, and responses to lipopolysaccharide. These DEGs predominantly exhibited molecular functions related to cytokine receptor binding and cytokine activity, and were significantly enriched in pathways such as cytokine interactions, TNF signaling, and tuberculosis-related pathways. PPI analysis identified 10 core genes, namely IL1B, TNF, IL6, IL1A, CCL20, CXCL1, CXCL10, CXCL8, CCL3, and CCR7. Further analysis using the GSE19439 validation dataset confirmed that CXCL10 and IL1B were similarly upregulated. ELISA validation also revealed significant differences in CXCL10 and IL1B expression between healthy controls and tuberculosis patients, with mean ELISA values of 0.570 and 0.827 for CXCL10, and 1.245 and 2.067 for IL1B (t=25.353, P<0.001; t=11.840, P=0.002). Logistic regression showed that CXCL10 and IL1B performed well in distinguishing the healthy group and the tuberculosis group (AUCCXCL10=0.854, AUCIL1B=0.818). Conclusion: Our study revealed the coordination of causal genes involved in tuberculosis onset, and indicated that CXCL10 and IL1B could serve as new potential biomarkers for the diagnosis of tuberculosis.

Key words: Tuberculosis, Genomic library, Data mining, Gene expression, Biological markers

中图分类号: