作者刘艺彬
姓名汉语拼音liuyibin
学号2021000003021
培养单位兰州财经大学
电话18191385233
电子邮件1335586823@qq.com
入学年份2021-9
学位类别专业硕士
培养级别硕士研究生
一级学科名称应用统计
学科代码0252
第一导师姓名孙景云
第一导师姓名汉语拼音sunjingyun
第一导师单位兰州财经大学
第一导师职称教授
题名融合多源数据信息的上证指数趋势预测
英文题名Trend prediction of Shanghai Stock Exchange Index based on Multi-source data information
关键词多源数据融合 多元变分模态分解 卷积神经网络 涨跌预测
外文关键词Multi-source data fusion ; Multivariate variational mode ; Convolutional neural networks ; Fluctuation forecast
摘要

  中国股票市场愈发受到国内外投资者的青睐,但股市高收益和高风险并存,它可能给投资者带来损失,为应对市场风险,建立股价预测模型显得尤为重要。股票价格波动受到宏观经济、投资者预期等多种不确定因素的影响,因此,通过融合多源数据构建模型对股价进行预测,已经逐渐成为投资研究领域的新趋势。

  本文基于已有文献的研究,以上证指数为例分别进行数值预测和涨跌预测的实证研究,主要研究工作如下:(1)利用网络搜索信息提取投资者关注度指标,提出了基于多元变分模态分解(MVMD)的上证指数组合预测模型。首先,采用时差相关分析法(TDCA)筛选出与上证指数存在关联的百度搜索关键词序列,根据关键词的含义将其划分为三类,并利用核主成分分析(KPCA)法对每类关键词信息分别进行降维和特征提取,将累积贡献率超过75%的核主成分作为上证指数的辅助预测因子;其次,利用MVMD方法对上证指数收盘价和辅助预测因子进行同步分解,并根据样本熵值及相关性指标重构为高、中、低频序列;最后,采用麻雀搜索算法(SSA)优化的随机森林(RF)、支持向量机(SVM)和长短期记忆神经网络(LSTM)分别预测各子序列并将预测值线性集成得到最终预测结果。(2)通过融合多源数据信息,提出了基于卷积神经网络(CNN)模型的上证指数涨跌预测模型。首先,使用小波变换(WT)对上证指数数据进行去 噪然后计算出 53 个技术指标并利用支持向量机递归特征消除法(SVM-RFE)对技术指标进行特征筛选;其次,使用 CNN 对宏观经济数据、改进的技术指标数据、不同类型的百度关键词等不同来源信息分别进行降维提取;最后,采用基于灰狼优化(GWO)算法的 BP 神经网络、支持向量机(SVM)和长短期记忆神经 网络(LSTM)分别对上证指数的涨跌方向进行预测。

  实证结果表明,在 MVMD 分解框架下融合网络搜索信息的组合预测模型可以有效提升预测精度,与未分类的百度搜索信息模型相比,分类后的百度搜索信息作为辅助预测因子在各个指标上均取得了更高的预测精度,不同类型的百度搜索关键词集合作为反映投资者对上证指数走势关注程度的微观变量会随着市场经济环境的变化而发生变化。其次,结合股票评价的多源数据(即历史交易数据、技术指标、宏观经济变量和网络搜索信息)与多个基准模型进行对比,表明多源数据的融合可以减小股票预测模型的误差,且优于仅使用单一来源信息预测的结果。另外,利用本文所提方法对沪深 300 指数进行涨跌预测,根据预测结果对沪深300股指期货构建交易策略进行回测,基于本文预测信息下的交易回测获得了良好的超额收益。 

英文摘要

  China stock market is increasingly favored by investors at home and abroad, but high returns and high risks coexist in the stock market, which may bring losses to investors. To effectively manage market risks, it is essential to create a stock price forecasting model. With the stock price being swayed by numerous unpredictable elements such as macro-economy and investors' expectations, investment research has gradually adopted a new approach of constructing a model that can forecast the stock price by incorporating multiple sources of data.

  This paper utilizes the Shanghai Stock Exchange Index to carry out empirical research on numerical prediction and fluctuation forecast, drawing from existing literature. The primary focus of the study includes extracting investors' attention index from online search data and developing a combined prediction model for the Shanghai Stock Exchange Index using multivariate variational modal decomposition (MVMD). Firstly, the time difference correlation analysis (TDCA) was used to screen out the Baidu search keyword sequences related to the Shanghai Stock Exchange Index, and the keywords were divided into three categories according to their meanings. Then the kernel principal component analysis (KPCA) was used to extract the dimensionality and features of each category of keyword information, and the kernel principal component with cumulative contribution rate of over 75% was used as the auxiliary predictor of the Shanghai Stock Exchange Index. In addition, the MVMD approach is utilized to break down the closing value of the Shanghai Stock Exchange Composite Index concurrently with the supporting predictors, then reconstructs it into high, medium, and low-frequency sequences based on the sample entropy value and correlation index. Ultimately, Random Forest (RF), Support Vector Machine (SVM), and Long-term and Short-term Memory Neural Network (LSTM) enhanced with Sparrow Search Algorithm (SSA) are employed to individually forecast each subsection, with their outcomes being combined linearly to produce the ultimate prediction. (2) By fusing multi-source data information, this paper puts forward a forecast model of Shanghai Stock Exchange Index based on Convolutional Neural Networks (CNN) model. Firstly, wavelet transform (WT) is used to denoise the Shanghai Stock Exchange index data, and then 53 technical indexes are calculated, and the above technical indexes are screened by SVM-RFE. Secondly, CNN is used to reduce the dimension of data information from different sources, macroeconomic data, improved technical index data and different types of Baidu keywords. BP neural networks, SVM support vector machines and LSTM based on the Grey Wolf Optimization algorithm are all employed to forecast the Shanghai Stock Exchange Index's ascent and descent respectively. 

  The empirical results show that the combined forecasting model with network search information under the MVMD decomposition framework can effectively improve the forecasting accuracy. Compared with the unclassified Baidu search information model, the classified Baidu search information as an auxiliary forecasting factor has achieved higher forecasting accuracy in all indicators, and different types of Baidu search keyword sets, as microscopic variables reflecting investors' attention to the trend of the Shanghai Stock Exchange, will change with the changes of the market economic environment. By contrasting the multi-source data of stock evaluation (historical trading, technical indicators, macroeconomic variables and online search information) with a variety of benchmark models, it is evident that combining these sources can decrease the inaccuracy of stock prediction models; this is superior to the outcome of predicting using only single source information. Furthermore, the technique outlined in this study is applied to forecast the fluctuations of the Shanghai and Shenzhen 300 index, and the trading approach for Shanghai and Shenzhen 300 index futures is assessed based on the anticipated outcomes. The back testing based on the predicted information in this paper has obtained good excess returns.

学位类型硕士
答辩日期2024-05-25
学位授予地点甘肃省兰州市
语种中文
论文总页数76
参考文献总数64
馆藏号0005622
保密级别公开
中图分类号C8/398
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/36792
专题统计与数据科学学院
推荐引用方式
GB/T 7714
刘艺彬. 融合多源数据信息的上证指数趋势预测[D]. 甘肃省兰州市. 兰州财经大学,2024.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2021000003021.pdf(5408KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[刘艺彬]的文章
百度学术
百度学术中相似的文章
[刘艺彬]的文章
必应学术
必应学术中相似的文章
[刘艺彬]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2021000003021.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。