Email Alert | RSS    帮助

中国防痨杂志 ›› 2020, Vol. 42 ›› Issue (6): 614-620.doi: 10.3969/j.issn.1000-6621.2020.06.014

• 论著 • 上一篇    下一篇

ARIMA模型的建立及对中国肺结核月报告例数的预测效果研究

张顺先, 邱磊, 张少言, 李翠, 胡骏, 田黎明, 鹿振辉()   

  1. 200032 上海中医药大学附属龙华医院呼吸疾病研究所(张顺先、邱磊、张少言、李翠、田黎明、鹿振辉),微生物室(胡骏)
  • 收稿日期:2020-01-17 出版日期:2020-06-10 发布日期:2020-06-11
  • 通信作者: 鹿振辉 E-mail:Dr_luzh@shutcm.edu.cn
  • 基金资助:
    “十三五”国家科技重大专项(2018ZX10725-509)

A study of prediction effect of autoregressive integrated moving average model on the monthly reported pulmonary tuberculosis cases in China

ZHANG Shun-xian, QIU Lei, ZHANG Shao-yan, LI Cui, HU Jun, TIAN Li-ming, LU Zhen-hui()   

  1. Respiratory Research Institute of Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
  • Received:2020-01-17 Online:2020-06-10 Published:2020-06-11
  • Contact: LU Zhen-hui E-mail:Dr_luzh@shutcm.edu.cn

摘要:

目的 建立自回归移动平均(autoregressive integrated moving average,ARIMA)模型,并对全国(不包括我国港澳台地区,下同)肺结核月报告患者例数进行预测效果研究,为肺结核防控措施的制定提供科学参考。方法 通过中国疾病预防控制中心主办的《疾病监测》杂志公布的我国每月甲、乙、丙类传染病疫情动态简介,搜集2006年1月至2019年8月全国肺结核月报告患者例数。采用SPSS 26.0统计学软件,以2006年1月至2018年12月的全国肺结核月报告患者例数为基础建立时间序列,初步识别和定阶ARIMA模型类型;再以满足模型简洁、ARIMA模型各参数[包括自回归法(AR),平均移动法(MA),季节自回归法(SAR),季节移动平均法(SMA)]均有统计学意义(P值均<0.05),以及P>0.05的模型总体检验指标(Ljung-Box Q值)、最大平稳决定系数(R 2)、最小整体模型的标准化贝叶斯信息准则值(NBIC)、最小均方根误差(RMSE)为标准筛选几种ARIMA模型;继而以2019年1—8月报告患者例数作为验证数据,参照预测值相对误差越小模型越优的原则筛选出最小相对误差的模型为最优模型;最后再以该模型预测我国2019年9月至2020年12月肺结核月报告患者例数。结果 根据2006—2018年每年的全国肺结核月报告患者例数为基础建立时间序列,确定需拟合ARIMA(p,d,q)或ARIMA(p,d,q)×(P,D,Q)模型。以Ljung-Box Q值所对应的P值均>0.05、模型简洁、模型各参数均有统计学意义(P值均<0.05)筛选出12个基本模型,然后再以R 2最大的模型[ARIMA(1,0,1)(0,1,1)12,R 2=0.707]、RMSE最小的模型[ARIMA(0,1,2)(0,1,1)12,RMSE=9147.85]、NBIC最小的模型[ARIMA(0,1,1)(0,1,1)12,NBIC=18.355]、Ljung-Box Q值最小的模型[ARIMA(1,1,1)(0,1,1)12,Ljung-Box Q=8.797]作为备用模型,预测2019年1—8月中国肺结核月报告患者例数,并与实际的月报告患者例数进行比较,确定预测平均相对误差最小(0.55%)、MA(1)=0.875(t=19.243,P<0.001)、SMA(1)=0.876(t=7.596,P<0.001)、Ljung-Box Q=9.876(df=16,P=0.873)的ARIMA(0,1,1)(0,1,1)12模型为最优模型。再以该模型预测我国2019年9月至2020年12月肺结核月报告患者例数,其中2020年1—12月患者总计1025863例,平均每月85489例。结论 ARIMA(0,1,1)(0,1,1)12模型对预测中国肺结核月报告患者例数方面效果较好,但应注意模型的建立和预测是个动态变化过程,需不断根据积累的数据进行调整,从而提高预测精度。

关键词: 结核, 肺, 疾病报告, 流行病学研究设计, 模型, 统计学, 预测

Abstract:

Objective An autoregressive integrated moving average (ARIMA) model was used to predict the monthly pulmonary tuberculosis cases in China(excluding Hong Kong, Macao and Taiwan regions) to provide a reference for pulmonary tuberculosis prevention and control. Methods Monthly pulmonary tuberculosis cases number in China from January 2006 to December 2018 reported on Disease Surveillance sponsored by CDC were collected. Based on these data, time series, preliminary identification and ordering of ARIMA model types were conducted using SPSS 26.0. Several ARIMA models were selected according to that both the simplicity of the model and the parameters of the ARIMA model (including autoregressive method (AR), average moving method (MA), seasonal autoregressive method (SAR), seasonal moving average method (SMA)) were statistically significant (Ps<0.05), as well as the overall test index (Ljung-Box Q value), maximum stationary coefficient (R 2) of the model, standardized Bayesian information criterion value (NBIC) of the smallest overall model, and minimum root mean square error (RMSE). Numbers of reported cases from January to August 2019 were used as verification, and the model with the smallest relative error was selected as the optimal model according to that the smaller the relative error of the predicted value, the better the model; finally, the model was used to predict monthly reported numbers of tuberculosis patients from September 2019 to December 2020 in China. Results Time series were based on cases from January 2006 to December 2018, the fitted model was ARIMA (p, d, q) or ARIMA (p, d, q)×(P, D, Q). Twelve models were selected according to P value (which is relative to Ljung-Box Q)>0.05,the simplicity of the model, and parameters of the model were statistically significant (all P<0.05); and models with the maximum R 2 (ARIMA (1, 0, 1) (0, 1, 1)12, R 2=0.707)), or with the minimum RMSE (ARIMA (0, 1, 2) (0, 1, 1)12, RMSE=9147.85), or with the minimum NBIC (ARIMA (0, 1, 1) (0, 1, 1)12, NBIC=18.355)), or with the minimum Ljung-Box Q (ARIMA (1, 1, 1) (0, 1,1)12, Ljung-Box Q=8.797)) were taken as alternatve models, to predict numbers of reported cases from January to August 2019, which were then compared with the actual data, to determine the optimal ARIMA model (ARIMA (0, 1, 1) (0, 1, 1)12 model), with the relative error was the smallest (0.55%), MA (1)=0.875 (t=19.243, P<0.001), SMA (1)=0.876 (t=7.596, P<0.001), Ljung-Box Q=9.876 (df=16, P=0.873). The ARIMA (0, 1, 1) (0, 1, 1)12 model was used to predict numbers of monthly reported tuberculosis cases in China from September 2019 to December 2020; in 2020 year, there will be 1025863 cases totally with average of 85489 cases monthly. Conclusion ARIMA (0, 1, 1) (0, 1, 1)12 model is the better model to predict the monthly pulmonary tuberculosis cases in China. However, in order to improve accuracy of the prediction, the establishment and prediction of the model is a dynamic process needed to be adjusted continuously according to accumulated data.

Key words: Tuberculosis, pulmonary, Disease notification, Epidemiologic research design, Models, statistical, Forecasting