中国防痨杂志 ›› 2022, Vol. 44 ›› Issue (11): 1193-1198.doi: 10.19982/j.issn.1000-6621.20220219
李相辰1, 柳正卫2, 陆烨玮1, 朱业蕾2, 张明五2, 蒋锦琴3, 彭小军1, 王炜欣1, 高俊顺1, 王晓萌2()
收稿日期:
2022-06-07
出版日期:
2022-11-10
发布日期:
2022-11-03
通信作者:
王晓萌
E-mail:xmwang@cdc.zj.cn
基金资助:
Li Xiangchen1, Liu Zhengwei2, Lu Yewei1, Zhu Yelei2, Zhang Mingwu2, Jiang Jinqin3, Peng Xiaojun1, Wang Weixin1, Gao Junshun1, Wang Xiaomeng2()
Received:
2022-06-07
Online:
2022-11-10
Published:
2022-11-03
Contact:
Wang Xiaomeng
E-mail:xmwang@cdc.zj.cn
Supported by:
摘要:
全基因组测序技术已经广泛应用于结核分枝杆菌的研究中,包括谱系鉴定、微进化、耐药预测、传播监测、混合感染诊断等多方面。生物信息学在基因组学研究中贯穿了从数据处理、分析到可视化的各个阶段,对全基因组测序技术的应用起到了至关重要的作用。笔者主要介绍和归纳了目前在结核分枝杆菌全基因组测序中常用的主流生物信息学分析软件和平台,并对近年来新开发的生物信息学分析方法从可用性、软件选择及应用等方面进行概述,为同领域研究者更方便、更灵活地开展数据分析和快速选择研究分析工具提供参考。
中图分类号:
李相辰, 柳正卫, 陆烨玮, 朱业蕾, 张明五, 蒋锦琴, 彭小军, 王炜欣, 高俊顺, 王晓萌. 结核分枝杆菌全基因组测序数据分析方法与应用进展[J]. 中国防痨杂志, 2022, 44(11): 1193-1198. doi: 10.19982/j.issn.1000-6621.20220219
Li Xiangchen, Liu Zhengwei, Lu Yewei, Zhu Yelei, Zhang Mingwu, Jiang Jinqin, Peng Xiaojun, Wang Weixin, Gao Junshun, Wang Xiaomeng. Progress and application of whole genome sequencing data analysis of Mycobacterium tuberculosis[J]. Chinese Journal of Antituberculosis, 2022, 44(11): 1193-1198. doi: 10.19982/j.issn.1000-6621.20220219
[1] | World Health Organization. Global tuberculosis report 2021. Geneva: World Health Organization, 2021. |
[2] |
刘翠华, 任小波, 汪静, 等. 科技助力结核病防控: 现状, 进展与对策. 中国科学院院刊, 2022, 37(1):11. doi: 10.16418/j.issn.1000-3045.20211116002.
doi: 10.16418/j.issn.1000-3045.20211116002 |
[3] |
Cole ST, Brosch R, Parkhill J, et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature, 1998, 393(6685):537-544. doi: 10.1038/31159.
doi: 10.1038/31159 URL |
[4] |
Morey M, Fernández-Marmiesse A, Castiñeiras D, et al. A glimpse into past, present, and future DNA sequencing. Mol Genet Metab, 2013, 110(1/2):3-24. doi: 10.1016/j.ymgme.2013.04.024.
doi: 10.1016/j.ymgme.2013.04.024 URL |
[5] |
Meehan CJ, Goig GA, Kohl TA, et al. Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues. Nat Rev Microbiol, 2019, 17(9):533-545. doi: 10.1038/s41579-019-0214-5.
doi: 10.1038/s41579-019-0214-5 pmid: 31209399 |
[6] |
中华医学会检验医学分会. 宏基因组测序病原微生物检测生物信息学分析规范化管理专家共识. 中华检验医学杂志, 2021, 44(9):799-807. doi: 10.3760/cma.j.cn114452-20210322-00178.
doi: 10.3760/cma.j.cn114452-20210322-00178 |
[7] |
张洁, 任怡宣, 潘丽萍, 等. 全基因组测序在结核分枝杆菌研究中的应用. 中国防痨杂志, 2020, 42(7):737-740. doi: 10.3969/j.issn.1000-6621.2020.07.017.
doi: 10.3969/j.issn.1000-6621.2020.07.017 |
[8] |
Menardo F, Duchêne S, Brites D, et al. The molecular clock of Mycobacterium tuberculosis. PLoS Pathog, 2019, 15(9):e1008067. doi: 10.1371/journal.ppat.1008067.
doi: 10.1371/journal.ppat.1008067 |
[9] |
Nebenzahl-Guimaraes H, Jacobson KR, Farhat MR, et al. Systematic review of allelic exchange experiments aimed at identifying mutations that confer drug resistance in Mycobacterium tuberculosis. J Antimicrob Chemother, 2014, 69(2):331-342. doi: 10.1093/jac/dkt358.
doi: 10.1093/jac/dkt358 pmid: 24055765 |
[10] |
Sandgren A, Strong M, Muthukrishnan P, et al. Tuberculosis drug resistance mutation database. PLoS Med, 2009, 6(2):e1000002. doi: 10.1371/journal.pmed.1000002.
doi: 10.1371/journal.pmed.1000002 |
[11] |
Koster J, Rahmann S. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics, 2012, 28(19):2520-2522. doi: 10.1093/bioinformatics/bts480.
doi: 10.1093/bioinformatics/bts480 URL |
[12] |
Chen S, Zhou Y, Chen Y, et al. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.
doi: 10.1093/bioinformatics/bty560 URL |
[13] |
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15):2114-2120. doi: 10.1093/bioinformatics/btu170.
doi: 10.1093/bioinformatics/btu170 pmid: 24695404 |
[14] |
Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol, 2014, 15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
doi: 10.1186/gb-2014-15-3-r46 URL |
[15] |
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997. doi: 10.48550/arXiv.1303.3997.
doi: 10.48550/arXiv.1303.3997 |
[16] |
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods, 2012, 9(4):357-359. doi: 10.1038/nmeth.1923.
doi: 10.1038/nmeth.1923 |
[17] |
Danecek P, Bonfield JK, Liddle J, et al. Twelve years of SAMtools and BCFtools. Gigascience, 2021, 10(2):giab008. doi: 10.1093/gigascience/giab008.
doi: 10.1093/gigascience/giab008 URL |
[18] |
McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res, 2010, 20(9):1297-1303. doi: 10.1101/gr.107524.110.
doi: 10.1101/gr.107524.110 pmid: 20644199 |
[19] |
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907. doi: 10.48550/arXiv.1207.3907.
doi: 10.48550/arXiv.1207.3907 |
[20] |
Cingolani P, Platts A, Wang le L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin), 2012, 6(2):80-92. doi: 10.4161/fly.19695.
doi: 10.4161/fly.19695 URL |
[21] |
Coll F, McNerney R, Guerra-Assunção JA, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun, 2014, 5(1):4812. doi: 10.1038/ncomms5812.
doi: 10.1038/ncomms5812 URL |
[22] | World Health Organization. GLASS Whole-Genome Sequencing for Surveillance of Antimicrobial Resistance. Geneva: World Health Organization, 2020. |
[23] |
Phelan JE, O’Sullivan DM, Machado D, et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med, 2019, 11(1):41. doi: 10.1186/s13073-019-0650-x.
doi: 10.1186/s13073-019-0650-x pmid: 31234910 |
[24] |
Bradley P, Gordon NC, Walker TM, et al. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat Commun, 2015, 6(1):10063. doi: 10.1038/ncomms10063.
doi: 10.1038/ncomms10063 |
[25] |
Yang T, Gan M, Liu Q, et al. SAM-TB: a whole genome sequencing data analysis website for detection of Mycobacterium tuberculosis drug resistance and transmission. Brief Bioinform, 2022, 23(2):bbac030. doi: 10.1093/bib/bbac030.
doi: 10.1093/bib/bbac030 |
[26] |
杨婷婷, 高谦. 构建基于全基因组数据的结核病耐药及传播监测网络. 中国防痨杂志, 2021, 43(7):645-648. doi: 10.3969/j.issn.1000-6621.2021.07.001.
doi: 10.3969/j.issn.1000-6621.2021.07.001 |
[27] |
Yang Y, Niehaus KE, Walker TM, et al. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Bioinformatics, 2018, 34(10):1666-1671. doi: 10.1093/bioinformatics/btx801.
doi: 10.1093/bioinformatics/btx801 pmid: 29240876 |
[28] |
Yang Y, Walker TM, Walker AS, et al. DeepAMR for predicting co-occurrent resistance of Mycobacterium tuberculosis. Bioinformatics, 2019, 35(18):3240-3249. doi: 10.1093/bioinformatics/btz067.
doi: 10.1093/bioinformatics/btz067 pmid: 30689732 |
[29] |
Kouchaki S, Yang Y, Lachapelle A, et al. Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking. Front Microbiol, 2020, 11:667. doi: 10.3389/fmicb.2020.00667.
doi: 10.3389/fmicb.2020.00667 pmid: 32390972 |
[30] |
Gröschel MI, Owens M, Freschi L, et al. GenTB: A user-friendly genome-based predictor for tuberculosis resistance powered by machine learning. Genome Med, 2021, 13(1):138. doi: 10.1186/s13073-021-00953-4.
doi: 10.1186/s13073-021-00953-4 pmid: 34461978 |
[31] |
Nikolayevskyy V, Niemann S, Anthony R, et al. Role and value of whole genome sequencing in studying tuberculosis transmission. Clin Microbiol Infect, 2019, 25(11):1377-1382. doi: 10.1016/j.cmi.2019.03.022.
doi: 10.1016/j.cmi.2019.03.022 URL |
[32] |
Jandrasits C, Kröger S, Haas W, et al. Computational pan-genome mapping and pairwise SNP-distance improve detection of Mycobacterium tuberculosis transmission clusters. PLoS Comput Biol, 2019, 15(12):e1007527. doi: 10.1371/journal.pcbi.1007527.
doi: 10.1371/journal.pcbi.1007527 |
[33] |
Stimson J, Gardy J, Mathema B, et al. Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions. Mol Biol Evol, 2019, 36(3):587-603. doi: 10.1093/molbev/msy242.
doi: 10.1093/molbev/msy242 pmid: 30690464 |
[34] |
Yang C, Luo T, Shen X, et al. Transmission of multidrug-resistant Mycobacterium tuberculosis in Shanghai, China: a retrospective observational study using whole-genome sequencing and epidemiological investigation. Lancet Infect Dis, 2017, 17(3):275-284. doi: 10.1016/S1473-3099(16)30418-2.
doi: 10.1016/S1473-3099(16)30418-2 URL |
[35] |
Jombart T, Eggo RM, Dodd PJ, et al. Reconstructing disease outbreaks from genetic data: a graph approach. Heredity, 2011, 106(2):383-390. doi: 10.1038/hdy.2010.78.
doi: 10.1038/hdy.2010.78 pmid: 20551981 |
[36] |
Didelot X, Gardy J, Colijn C. Bayesian Inference of Infectious Disease Transmission from Whole-Genome Sequence Data. Mol Biol Evol, 2014, 31(7):1869-1879. doi: 10.1093/molbev/msu121.
doi: 10.1093/molbev/msu121 pmid: 24714079 |
[37] |
Didelot X, Kendall M, Xu Y, et al. Genomic Epidemiology Analysis of Infectious Disease Outbreaks Using TransPhylo. Curr Protoc, 2021, 1(2):e60. doi: 10.1002/cpz1.60.
doi: 10.1002/cpz1.60 |
[38] |
Shannon P, Markiel A, Ozier O, et al. Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res, 2003, 13(11):2498-2504. doi: 10.1101/gr.1239303.
doi: 10.1101/gr.1239303 pmid: 14597658 |
[39] | Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal, Complex Systems, 2006, 1695(5):1-9. |
[40] |
Wu B, Zhu W, Wang Y, et al. Genetic composition and evolution of the prevalent Mycobacterium tuberculosis lineages 2 and 4 in the Chinese and Zhejiang Province populations. Cell Biosci, 2021, 11(1):162. doi: 10.1186/s13578-021-00673-7.
doi: 10.1186/s13578-021-00673-7 URL |
[41] |
Holt KE, McAdam P, Thai PVK, et al. Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam. Nat Genet, 2018, 50(6):849-856. doi: 10.1038/s41588-018-0117-9.
doi: 10.1038/s41588-018-0117-9 URL |
[42] |
Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol, 2016, 33(7):1870-1874. doi: 10.1093/molbev/msw054.
doi: 10.1093/molbev/msw054 pmid: 27004904 |
[43] |
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 2014, 30(9):1312-1313. doi: 10.1093/bioinformatics/btu033.
doi: 10.1093/bioinformatics/btu033 pmid: 24451623 |
[44] |
Nguyen LT, Schmidt HA, von Haeseler A, et al. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol, 2015, 32(1):268-274. doi: 10.1093/molbev/msu300.
doi: 10.1093/molbev/msu300 URL |
[45] |
Price MN, Dehal PS, Arkin AP. FastTree 2-Approximately Maximum-Likelihood Trees for Large Alignments. PLoS One, 2010, 5(3):e9490. doi: 10.1371/journal.pone.0009490.
doi: 10.1371/journal.pone.0009490 |
[46] |
Zhou X, Shen XX, Hittinger CT, et al. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets. Mol Biol Evol, 2018, 35(2):486-503. doi: 10.1093/molbev/msx302.
doi: 10.1093/molbev/msx302 pmid: 29177474 |
[47] |
王璐琦, 熊海燕, 王伟炳. 系统发育研究在传染病分子流行病学中的应用. 中华流行病学杂志, 2022, 43(2):282-285. doi: 10.3760/cma.j.cn112338-20210701-00515.
doi: 10.3760/cma.j.cn112338-20210701-00515 |
[48] |
Bouckaert R, Heled J, Kühnert D, et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol, 2014, 10(4):e1003537. doi: 10.1371/journal.pcbi.1003537.
doi: 10.1371/journal.pcbi.1003537 |
[49] |
Subramanian B, Gao S, Lercher MJ, et al. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic Acids Res, 2019, 47(W1):W270-W275. doi: 10.1093/nar/gkz357.
doi: 10.1093/nar/gkz357 URL |
[50] |
Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res, 2016, 44(W1):W242-W245. doi: 10.1093/nar/gkw290.
doi: 10.1093/nar/gkw290 URL |
[51] |
杨超, 杨瑞馥, 崔玉军. 细菌全基因组关联研究的方法与应用. 遗传, 2018, 40(1):57-65. doi: 10.16288/j.yczz.17-303.
doi: 10.16288/j.yczz.17-303 |
[52] |
Farhat MR, Freschi L, Calderon R, et al. GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun, 2019, 10(1):2128. doi: 10.1038/s41467-019-10110-6.
doi: 10.1038/s41467-019-10110-6 pmid: 31086182 |
[53] |
Allen JP, Snitkin E, Pincus NB, et al. Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning. Trends Microbiol, 2021, 29(7):621-633. doi: 10.1016/j.tim.2020.12.002.
doi: 10.1016/j.tim.2020.12.002 pmid: 33455849 |
[54] |
Bainomugisa A, Meumann EM, Rajahram GS, et al. Genomic epidemiology of tuberculosis in eastern Malaysia: insights for strengthening public health responses. Microb Genom, 2021, 7(5):000573. doi: 10.1099/mgen.0.000573.
doi: 10.1099/mgen.0.000573 |
[55] |
Lees JA, Galardini M, Bentley SD, et al. pyseer: a comprehensive tool for microbial pangenome-wide association studies. Bioinformatics, 2018, 34(24):4310-4312. doi: 10.1093/bioinformatics/bty539.
doi: 10.1093/bioinformatics/bty539 pmid: 30535304 |
[56] |
Saund K, Snitkin ES. Hogwash: three methods for genome-wide association studies in bacteria. Microb Genom, 2020, 6(11):mgen000469. doi: 10.1099/mgen.0.000469.
doi: 10.1099/mgen.0.000469 |
[57] |
Sobkowiak B, Glynn JR, Houben RMGJ, et al. Identifying mixed Mycobacterium tuberculosis infections from whole genome sequence data. BMC Genomics, 2018, 19(1):613. doi: 10.1186/s12864-018-4988-z.
doi: 10.1186/s12864-018-4988-z pmid: 30107785 |
[58] |
Anyansi C, Keo A, Walker BJ, et al. QuantTB-a method to classify mixed Mycobacterium tuberculosis infections within whole genome sequencing data. BMC Genomics, 2020, 21(1):80. doi: 10.1186/s12864-020-6486-3.
doi: 10.1186/s12864-020-6486-3 URL |
[59] |
Gabbassov E, Moreno-Molina M, Comas I, et al. SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data. Microb Genom, 2021, 7(6):000607. doi: 10.1099/mgen.0.000607.
doi: 10.1099/mgen.0.000607 |
[60] |
Wang Y, Song F, Zhu J, et al. GSA: Genome Sequence Archive. Genomics Proteomics Bioinformatics, 2017, 15(1):14-18. doi: 10.1016/j.gpb.2017.01.001.
doi: 10.1016/j.gpb.2017.01.001 URL |
[61] |
Kavvas ES, Catoiu E, Mih N, et al. Machine learning and structural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat Commun, 2018, 9(1):4306. doi: 10.1038/s41467-018-06634-y.
doi: 10.1038/s41467-018-06634-y pmid: 30333483 |
[62] |
Walker TM, Miotto P, Köser CU, The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: a genotypic analysis. Lancet Microbe, 2022, 3(4):e265-e273. doi: 10.1016/S2666-5247(21)00301-3.
doi: 10.1016/S2666-5247(21)00301-3 URL |
[1] | 张睿, 刘艳萍, 钱军, 方强林, 杨崇广. 基于全基因组测序结核分枝杆菌宿主内异质性的鉴定及其研究进展[J]. 中国防痨杂志, 2022, 44(11): 1199-1204. |
[2] | 李冰莹, 郑旭彬, 胡屹, 徐飚. 全基因组数据分析工具TB Profiler v2.8.0、Mykrobe v0.7.0和PhyResSE v1.0在耐药结核病检测中的价值[J]. 中国防痨杂志, 2020, 42(11): 1196-1202. |
[3] | 高凌玉,张健源,李传友,周建琴,游雪甫,王振,汪月,孙承航,. 康乐霉素A体外抗结核活性的初步研究[J]. 中国防痨杂志, 2007, 29(2): 160-162. |
[4] | 吴风霞,李慧灵,王鑫君,. 结核分枝杆菌L型检测的临床意义[J]. 中国防痨杂志, 2004, 26(2): 107-109. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||