多模态数据驱动的组合预测方法研究及应用

作者	于婷
姓名汉语拼音	Yu Ting
学号	2019000003017
培养单位	兰州财经大学
电话	15706059028
电子邮件	Fearless_yt@163.com
入学年份	2019-9
学位类别	学术硕士
培养级别	硕士研究生
学科门类	理学
一级学科名称	统计学
学科方向	数理统计学
学科代码	0714Z3
第一导师姓名	孟生旺
第一导师姓名汉语拼音	Meng Shengwang
第一导师单位	中国人民大学
第一导师职称	教授
题名	多模态数据驱动的组合预测方法研究及应用
英文题名	Research on multimodal data driven combined forecasting method and its application
关键词	集装箱吞吐量机场客流预测分解集成二次分解网络搜索信息布谷鸟搜索算法麻雀搜索算法
外文关键词	Container Throughput ; Network Search Information ; Cuckoo Search Algorithm ; Aparrow Aearch Algorithm ; Airport Passenger Flow Forecast ; Decomposition-Integration ; QuadraticDdecomposition
摘要	为了对时间序列数据进行更高精度的预测，本文基于”分解-集成”框架，提出了组合预测新方法，分别建立一次分解、二次分解、基于网络搜索信息的预测模型。本文首先提出了一种基于EEMD-PR/PSO-LSSVR-PM的组合预测模型。首先，采用集合经验模态分解(EEMD)将数据分解为多个不同频率的本征模态函数(IMFs)，以降低数据的复杂性。然后利用粒子群最小二乘支持向量回归(POS-LSSVR)来分别预测IMFs，利用一元多项式(PR)来预测带有趋势的残差项，最终对各个序列利用感知机模型(PM)进行非线性集成，得到最终预测结果。在实证分析中，以2019年我国港口集装箱吞吐量排名前十的大规模港口为研究对象，先运用K-Means聚类按数据特征将港口分为3类，从每一类中分别选取广州、营口、上海为三大代表性港口。利用本文所提出的组合预测方法进行实证预测。其次，在一次分解的基础上，提出了一种二次分解-集成预测模型。在一次分解降低原始数据的复杂度的基础上，进一步挖掘数据潜在特征，并且通过重构子序列避免预测时的误差累积。第一步，用集合经验模态分解(EEMD)将原始机场客流量分解，将得到的子序列重构，得到高、中、低频序列；第二步，高、中频序列由于其变化波动较大、频率较快，采取变分模态分解(VMD)方法对其进一步分解，使其均被分解为复杂度较低，且更易于预测的子序列；第三步，采用布谷鸟搜索算法优化BP神经网络(CS-BP)模型预测所有子序列，并采用试错法自适应的确定神经网络模型最佳滞后期；第四步，分别将高频、中频子序列的预测值采用CS-BP模型进行集成，得到高频、中频的预测值；最后，将所有高、中、低频的预测值采用CS-BP模型汇总集成为最终预测值。最后，在”分解-集成“框架下提出了一种基于网络搜索信息的组合预测新法。首先，采用平均影响值和时差相关分析法对机场旅客吞吐量相关的网络搜索关键词进行筛选，利用每个关键词搜索量与原始航空客流数据的相关程度确定最佳滞后期，进而合成综合搜索指数。其次，利用ICEEMDAN方法分别将机场旅客吞吐量和综合搜索指数分解为若干子模态序列，并依据子序列的样本熵值重构为高、中、低频序列。以搜索指数中的不同频率成分作为辅助输入信息，分别对机场旅客吞吐量的高频和中频序列采用麻雀搜索算法优化的BP神经网络(SSA-BP)模型进行预测，而低频序列采用自回归分布滞后模型进行预测，最后将不同频率序列预测值用SSA-BP进行综合集成得到最终的预测值。利用文章所提出的组合预测方法进行实证研究预测，结果表明，本文提出的方法在在港口集装箱吞吐量预测和机场客流预测中均有较高的预测精度和鲁棒性。
英文摘要	In order to predict time series data with higher accuracy, based onthe “decomposition-integration”framework, this thesis proposes a new method of combined forecasting, and establishes a forecasting model based on primary decomposition, secondary decomposition, and network search information. Empirically, it is found that the combined forecasting method can significantly improve the forecasting accuracy and show better robustness. In this thesis, a combined prediction model based on EEMD-PR/PSO-LSSVR-PM is proposed. First, ensemble empirical mode decomposition (EEMD) is used to decompose the data into multiple eigenmode functions (IMFs) with different frequencies to reduce the complexity of the data. Then use Particle Swarm Least Squares Support Vector Regression (POS-LSSVR) to predict IMFs separately, use Univariate Polynomial (PR) to predict residuals with trends, and finally use Perceptron Model (PM) for each sequence to perform nonlinear Integrate to get the final prediction result. In the empirical analysis, taking the top ten large-scale ports in my country's port container throughput in 2019 as the research object, the K-Means clustering was used to classify the ports into three categories according to data characteristics, and from each category, Guangzhou, Yingkou and Shanghai are the three representative ports. Empirical forecasting is carried out using the combined forecasting method proposed in this thesis. A quadratic decomposition-integrated prediction model is established. On the basis of reducing the complexity of the original data by the primary decomposition, the potential features of the data are further mined, and the accumulation of errors in prediction is avoided by reconstructing the subsequences. In the first step, the original airport passenger flow is decomposed by Ensemble Empirical Mode Decomposition (EEMD), and the obtained subsequences are reconstructed to obtain high, medium and low frequency sequences; in the second step, the high and medium frequency sequences fluctuate greatly due to their changes. , the frequency is faster, and the variationalmodal decomposition (VMD) method is used to further decompose it, so that it can be decomposed into subsequences with lower complexity and easier to predict; the third step is to use the cuckoo search algorithm to optimize the BP The neural network (CS-BP) model predicts all the subsequences, and uses the trial-and-error method to adaptively determine the optimal lag period of the neural network model; the fourth step is to use the CS-BP model for the predicted values of the high-frequency and intermediate-frequency subsequences. Integrate to get the predicted values of high frequency and medium frequency; finally, use the CS-BP model to aggregate and integrate all the predicted values of high, medium and low frequency into the final predicted value. This thesisproposes another new method of “decomposition-integration”combination prediction based on network search information. Firstly, the average impact value and time difference correlation analysis method are used to filter the network search keywords related to the airport passenger throughput, and the optimal lag period is determined by the correlation between the search volume of eachkeyword and the original aviation passenger flow data, and then acomprehensive search is synthesized. index. Secondly, the airport passenger throughput and comprehensive search index are decomposed into several sub-modal sequences by the ICEEMDAN method, and reconstructed into high, medium and low frequency sequences according to the sample entropy values of the subsequences. Using the different frequency components in the search index as auxiliary input information, the high-frequency and intermediate-frequency sequences of the airport passenger throughput are predicted by the BP neural network (SSA-BP) model optimized by the sparrow search algorithm, while the low-frequency sequences are predicted by the autoregressive distribution. The lag model is used for prediction, and finally the prediction values of different frequency series are integratedwith SSA-BP to obtain the final prediction value. The combined forecasting method proposed in this thesisis used to conduct empirical research forecasting. The results show that the method proposed in this thesishas high forecasting accuracy and robustness in both port container throughput forecasting and airport passenger flow forecasting.
学位类型	硕士
答辩日期	2022-05-15
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	68
参考文献总数	48
馆藏号	0004147
保密级别	公开
中图分类号	O212/25
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/32419
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	于婷. 多模态数据驱动的组合预测方法研究及应用[D]. 甘肃省兰州市. 兰州财经大学,2022.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2019000003017.pdf（4011KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载