基于“分解-聚类-集成”的PM2.5时空预测研究及其应用

作者	周尧民
姓名汉语拼音	ZhouYaomin
学号	2019000003027
培养单位	兰州财经大学
电话	13893612323
电子邮件	598127220@qq.com
入学年份	2019-9
学位类别	学术硕士
培养级别	硕士研究生
学科门类	经济学
一级学科名称	应用经济学
学科方向	统计学
学科代码	020208
授予学位	经济学硕士学位
第一导师姓名	黄恒君
第一导师姓名汉语拼音	HuangHengjun
第一导师单位	兰州财经大学
第一导师职称	教授
题名	基于“分解-聚类-集成”的PM2.5时空预测研究及其应用
英文题名	Spatio-temporal prediction of PM2.5 based on decomposition-clustering-integration and its application
关键词	时空预测模态分解时间序列聚类拉普拉斯算子 LSTM神经网络
外文关键词	Spatiotemporal prediction; Modal decomposition; Time series clustering; Laplacian; LSTM neural network
摘要	随着城市的发展、人口的集聚，城市汽车保有量持续增加，周边工厂排放的空气污染物等，造成城市环境恶化，居民出行以及健康状况受到严重影响。由此，利用城市空气质量数据、气象数据、空间POI等等城市大数据，构建精准的空气质量模型，从而更好的帮助居民制定出行计划，辅助政府制定环保决策。在构建空气质量预测模型中从时间维度和空间维度共同出发，不仅丰富了研究角度，并且将数据融合的构想运用在研究中，将时间序列与空间信息进行了融合。本文以PM_2.5污染物浓度为例，探究PM_2.5序列在时间维度与空间维度的特征提取方式，将其纳入预测模型，并将时间与空间维度的预测结果动态地结合，提升预测效果。主要工作如下：第一，探究时空模型理论算法，包括数据缺失值、离群点的处理，以及对运用相关性理论进行特征选取。在此基础上，本文深入研究了PM_2.5预测的各类前沿算法：模态分解、时间序列聚类、深度神经网络等，构建PM_2.5时空预测模型理论架构。第二，构建时空预测模型，在分析PM_2.5预测在时间维度和空间维度特性的基础上，分别从两个维度出发构建时间与空间预测器。在时间维度上，利用模态分解提取PM_2.5数据波动特征，运用时间序列聚类算法对分量进行重构，并基于ELSTM模型构建时间预测器；在空间维度运用拉普拉斯算子从图模型角度提取站点的空间关系，以此构建空间预测器；最后，运用XGBoost将两部分结果进行动态聚合，完成LX-M-CEEMDAN-VMD-LSTM模型的构建。第三，利用兰州市空气污染物浓度数据、气象数据以及地理信息对PM_2.5浓度序列进行预测。在时间预测模块中，运用CEEMDAN与VMD分解构建二层分解方法提取时序信息，并进行聚类重构，不仅提高了时序序列预测的精度，而且运用聚类进行数据重构进一步简化了模型。在空间预测模块中，拉普拉斯矩阵有效的提取数据的空间特征，提升了空间预测精度。基于XGBoost提取各类特征重要性，则将时间与空间特征动态结合，弥补了各自维度的不足。并以均方根误差(RMSE)、绝对值误差(MAE)和和平均绝对误差百分比（MAPE）三个评价指标以及DM检验对比此模型的优越性、有效性。结果表明，本文所构建模型的预测精度显著提高，在各项指标中均优于对照模型。
英文摘要	With the development of the city and the agglomeration of the population, the number of cars in the city continues to increase, and the air pollutants emitted by the surrounding factories have caused the deterioration of the urban environment, and the travel and health of the residents have been seriously affected. Therefore, using urban big data such as urban air quality data, meteorological data, spatial POI, etc., to build an accurate air quality model, so as to better help residents make travel plans and assist the government to make environmental protection decisions. Starting from the time dimension and the space dimension in the construction of the air quality prediction model not only enriches the research angle, but also applies the concept of data fusion in the research, and integrates the time series with the spatial information. Taking the PM_2.5 pollutant concentration as an example, this paper explores the feature extraction method of PM_2.5 sequence in the time dimension and space dimension, incorporates it into the prediction model, and dynamically combines the prediction results of the time and space dimensions to improve the prediction effect. The main work is as follows: Firstly, explore the theoretical algorithms of spatiotemporal models, including the processing of missing data and outliers, and feature selection using correlation theory. On this basis, this paper deeply studies various cutting-edge algorithms for PM_2.5 prediction: modal decomposition, time series clustering, deep neural network, etc., and constructs the theoretical framework of PM_2.5 spatiotemporal prediction model. Secondly, build a spatiotemporal prediction model, and build a temporal and spatial predictor on the basis of analyzing the characteristics of PM_2.5 prediction in the time dimension and space dimension. In the time dimension, the modal decomposition is used to extract the fluctuation characteristics of PM_2.5 data, the time series clustering algorithm is used to reconstruct the components, and the time predictor is constructed based on the ELSTM model; In the spatial dimension, the Laplacian operator is used to extract the spatial relationship of the site from the perspective of the graphical model, so as to construct the spatial predictor; finally, XGBoost is used to dynamically aggregate the two parts of the results to complete the LX-M-CEEMDAN-VMD-LSTM model 's build. Thirdly, the PM_2.5 concentration sequence is predicted by using lanzhou air pollutant concentration data, meteorological data and geographic information. In the time prediction module, CEEMDAN and VMD were used to construct a two-level decomposition method to extract time series information, and then cluster reconstruction was carried out, which not only improved the accuracy of time series prediction, but also further simplified the model by clustering data reconstruction. In the space prediction module, the Laplace matrix can effectively extract the spatial features of data and improve the accuracy of space prediction. Based on XGBoost, the importance of various features is extracted, and the temporal and spatial features are dynamically combined to make up for the deficiency of their respective dimensions. Root mean square error (RMSE), absolute error (MAE) and mean absolute error percentage (MAPE) were used to compare the advantages and effectiveness of the model. The empirical results show that the prediction accuracy of the proposed model is significantly improved, and it is superior to the control model in all indicators.
学位类型	硕士
答辩日期	2022-05-15
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	66
参考文献总数	48
馆藏号	0004157
保密级别	公开
中图分类号	C8/290
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/32156
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	周尧民. 基于“分解-聚类-集成”的PM2.5时空预测研究及其应用[D]. 甘肃省兰州市. 兰州财经大学,2022.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
10741_2019000003027_（6273KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载