作者曹静如
姓名汉语拼音caojingru
学号2020000003043
培养单位兰州财经大学
电话18298404686
电子邮件851619471@qq.com
入学年份2020-9
学位类别专业硕士
培养级别硕士研究生
一级学科名称应用统计
学科代码0252
第一导师姓名孙景云
第一导师姓名汉语拼音sunjingyun
第一导师单位兰州财经大学
第一导师职称教授
题名基于文本和网络搜索信息的游客流量预测研究——以海南为例
英文题名Research on tourist flow forecasting based on text and web search information: A Case Study of Hainan
关键词旅游预测 KPCA 百度指数 SnowNLP 情感指数
外文关键词Tourism forecast ; KPCA ; Baidu index ; SnowNLP ; Sentiment index
摘要

随着5G时代的到来,越来越多的游客利用互联网提前了解旅游目的地,进而制定旅游计划。基于这些互联网数据信息能够很大程度上提高对旅游人数的预测准确率,不仅可以动态监测游客行为,同时又克服了传统旅游数据的时滞问题。因此,基于互联网数据对游客流量进行预测是十分有必要的。

本文基于已有文献的研究,以海南省月度游客流量为例进行旅游预测的实证研究,主要研究工作如下:(1)针对网络搜索信息存在多噪声、非线性、高波动等特点,在关键词的选择和指数合成过程中存在诸多困难。本文提出一种新的搜索关键词选择及指数合成技术——R/S-TDC-EMD-KPCA方法,首先利用重标极差法(R/S)和时差相关法(TDC)选择具有预测能力的关键词,并对选择的关键词搜索量分别进行经验模态分解(EMD)降噪,最后利用核主成分(KPCA)方法合成网络综合搜索指数。通过对比验证所提取的网络综合搜索指数在游客流量预测中的有效性。(2)互联网数据信息代表了游客的不同行为特征,可全面反映游客的关注点、兴趣和情感倾向。本文提出一种基于百度指数、微博文本等互联网数据融合的旅游预测新方法。首先,基于R/S-TDC-KPCA方法将百度指数合成网络综合搜索指数。其次,从中国主流社交平台新浪微博中提取与最优关键词有关的文本数据信息,对提取出的文本信息实施数据清洗,并采用基于正负情感简单相加和基于正负情感非对称的方法构建情感指数。最后,将网络综合搜索指数、情感指数以及历史游客流量作为输入变量,构建SARIMAX模型进行实证预测研究。

实证结果表明,与其他传统预测模型相比,基于R/S-TDC-EMD-KPCA方法的网络综合搜索指数结合BP神经网络在海南旅游预测中具有较低的平均绝对百分比误差(MAPE)和归一化均方根误差(NRMSE),其中MAPE从10.44%下降到7.11%,NRMSE从14.66下降到9.81。因此,提出的R/S-TDC-EMD-KPCA方法能高质量的提取和合成网络搜索信息,进而可有效用于游客流量的辅助预测。其次,研究发现将网络综合搜索指数与微博情感指数同时作为预测因子时,可以有效提高预测精度,其水平预测精度均低于其他基准模型,MAE下降到15.23,MAPE下降到2.62%,RMSE下降到21.77,RMSPE下降到3.47%。另外,本文采用了两种不同的方法编制情感指数,分别是基于正负情感简单相加的情感指数和基于正负情感非对称的情感指数,研究发现不同的情感指数编制方法会对预测结果有一定的影响。基于不同人类心理行为构建正负情感非对称情形下的情感指数相比于正负情感的简单加总更能反映游客情感倾向,可以获得更好的预测效果。因此,基于文本和网络搜索信息的游客流量预测是有效的,这为旅游需求的精准预测提供了新的途径。

英文摘要

With the advent of the 5G era, more and more tourists use the Internet to understand tourist destinations in advance and then make travel plans. Based on these Internet data information, the accuracy of forecasting the number of tourists can be greatly improved, which can not only dynamically monitor the behavior of tourists, but also overcome the time lag problem of traditional tourism data. Therefore, it is necessary to predict tourist traffic based on Internet data.

Based on the research of existing literature, this paper takes the monthly tourist flow of Hainan Province as an example to carry out the empirical research of tourism forecasting, and the main research work is as follows: (1) In view of the characteristics of multiple noise, nonlinearity and high fluctuation of network search information, there are many difficulties in the selection of keywords and index synthesis. This paper proposes a new search keyword selection and exponential synthesis technology, R/S-TDC-EMD-KPCA method, which first uses the rescale range method (R/S) and time difference correlation method (TDC) to select keywords with predictive ability, performs empirical mode decomposition (EMD) noise reduction on the search volume of selected keywords, and finally synthesizes the network comprehensive search index by nuclear principal component (KPCA) method. The effectiveness of the extracted web comprehensive search index in tourist flow prediction is verified by comparison. (2) Internet data information represents the different behavioral characteristics of tourists, which can fully reflect tourists' concerns, interests and emotional tendencies. This paper proposes a new method of tourism forecasting based on the integration of Internet data such as Baidu index and Weibo text. Firstly, based on the R/S-TDC-KPCA method, the Baidu index is synthesized into a comprehensive online search index. Secondly, the text data information related to the optimal keywords is extracted from the mainstream Chinese social platform Sina Weibo, the extracted text information is cleansed, and the emotion index is constructed by using the methods of simple addition of positive and negative emotions and asymmetry based on positive and negative emotions. Finally, the comprehensive search index, sentiment index and historical tourist flow are used as input variables, and the SARIMAX model is constructed for empirical prediction research.

The empirical results show that compared with other traditional prediction models, the network comprehensive search index based on R/S-TDC-EMD-KPCA method combined with BP neural network has a lower mean absolute percentage error (MAPE) and normalized root mean square error (NRMSE) in Hainan tourism forecasting, in which MAPE decreases from 10.44% to 7.11%, and NRMSE decreases from 14.66 to 9.81. Therefore, the proposed R/S-TDC-EMD-KPCA method can extract and synthesize web search information with high quality, and then can be effectively used for auxiliary prediction of tourist flow. Secondly, it is found that when the network comprehensive search index and Weibo sentiment index are used as predictors at the same time, the prediction accuracy can be effectively improved, and the horizontal prediction accuracy is lower than that of other benchmark models, MAE drops to 15.23, MAPE drops to 2.62%, RMSE drops to 21.77, and RMSPE drops to 3.47%. In addition, this paper adopts two different methods to compile the sentiment index, namely the emotion index based on the simple addition of positive and negative emotions and the emotion index based on the asymmetric emotion of positive and negative emotions, and it is found that different emotion index compilation methods will have a certain impact on the prediction results. The emotional index under the asymmetric situation of positive and negative emotions based on different human psychological behaviors can better reflect the emotional tendency of tourists and obtain better prediction effect than the simple sum of positive and negative emotions. Therefore, the prediction of tourist flow based on text and web search information is effective, which provides a new way to accurately predict tourism demand.

学位类型硕士
答辩日期2023-05-20
学位授予地点甘肃省兰州市
语种中文
论文总页数64
参考文献总数58
馆藏号0005011
保密级别公开
中图分类号C8/337
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/33925
专题统计与数据科学学院
推荐引用方式
GB/T 7714
曹静如. 基于文本和网络搜索信息的游客流量预测研究——以海南为例[D]. 甘肃省兰州市. 兰州财经大学,2023.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2020000003043.pdf(2006KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[曹静如]的文章
百度学术
百度学术中相似的文章
[曹静如]的文章
必应学术
必应学术中相似的文章
[曹静如]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2020000003043.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。