Institutional Repository of School of Statistics
作者 | 杨丽娜 |
姓名汉语拼音 | yanglina |
学号 | 2018000003135 |
培养单位 | 兰州财经大学 |
电话 | 13340365692 |
电子邮件 | 1358898035@qq.com |
入学年份 | 2018-9 |
学位类别 | 专业硕士 |
培养级别 | 硕士研究生 |
一级学科名称 | 应用统计 |
学科代码 | 0252 |
第一导师姓名 | 黄恒君 |
第一导师姓名汉语拼音 | huanghengjun |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 基于LSTM的微博话题“超前点播”舆情分析 |
英文题名 | Analysis of Public Opinion on Weibo Topic "Advanced On-demand" Based on LSTM |
关键词 | 微博话题 相似度分析 LSTM text-CNN 舆情发展趋势 |
外文关键词 | microblog topic ; similarity analysis ; LSTM ; text CNN ; Public opinion development trend |
摘要 | 社交网站映射了人们交流互动的生活规律和行为,截止目前,网络产生了大量的由用户发布的主观性文本,使得文本研究社交网络对这些行业在互联网时代的发展具有重大价值。如何从互联网主观文本中挖掘网民情感信息成为当下自然语言处理和研究机构的热点研究话题。因此,本文利用深度学习分类方法和舆情发展趋势分析相结合的方法,对“超前点播”微博话题数据进行舆情分析,挖掘消费者对此类问题的态度和观点。 本文利用网络爬虫爬取微博话题“超前点播”评论、博文及文本发布作者的各类变量信息作为本文的原始数据,通过自然语言处理技术构建分类模型进行情感性倾向分析:1)文本数据预处理——去重、缺失值处理、正则化处理、分词、去停用词;2)词向量构建——利用word2cev构建文本词向量模型,并计算词词间的距离进行情感倾向分类及语义相似度分析;3)构建分类模型——构建并对比text-CNN和LSTM分类模型,选择分类效果最优的模型进行实验分析;4)热词可视化展示——展示“超前点播”高频词汇并挖掘网民态度和观点;5)展示并分析舆情发展趋势——以“超前点播”话题发帖时间和发帖量来展示并研究各阶段舆情发展趋势,同时,挖掘本次舆情各阶段的发展特点和网民情绪。最终,完成从“量”的角度研究本次“超前点播”舆情分析。 实验结果证明,LSTM模型对于“超前点播”话题能够更好的将情感倾向文本进行分类,提高分类查准率和查全率。并分析得“爱奇艺超前点播庆余年被判违法”舆情事件舆情形成期短,且发展初期以线性速度直达舆情高涨期。此外,高涨期消极言论占比大于整体数据集中负面情绪言论占比,可以看出用户在高涨期情绪以发泄为主。 |
英文摘要 | Social networking sites reflect the life rules and behaviors of people's communication and interaction. Up to now, the Internet has produced a large number of subjective texts released by users, making text-study social networks of great value to the development of these industries in the Internet era. How to mine the emotional information of netizens from subjective texts on the Internet has become a hot research topic in natural language processing and research institutions. Therefore, this article uses a combination of deep learning classification methods and public opinion development trend analysis to analyze public opinion on "advanced on-demand" microblog topic data, and dig out consumers' attitudes and opinions on such issues. This article uses a web crawler to crawl Weibo topic "advanced on-demand" comments, blog posts, and various variable information of the author of the text publishing as the original data of this article. A classification model is constructed through natural language processing technology to analyze sentimentality: 1) Text data prediction Processing--Deduplication, missing value processing, regularization processing, word segmentation, removal of stop words; 2) Word vector construction--Using word2cev to build a text word vector model, and calculate the distance between words for emotional tendency classification and semantic similarity Degree analysis; 3) Construction of classification models--Building and comparing text-CNN and LSTM classification models, select the model with the best classification effect for experimental analysis; 4) Visual display of hot words--Displaying and excavating “advanced on-demand” high-frequency vocabulary Attitudes and opinions of netizens 5) Displaying and analyzing the development trend of public opinion--Displaying and studying the development trend of public opinion at various stages with the time and volume of "advanced on-demand" topics, and at the same time, explore the development characteristics of the public opinion and the sentiment of netizens at each stage. In the end, the public opinion analysis of this "advanced on-demand" is studied from the perspective of "quantity". The experimental results prove that the LSTM model can better classify sentimental text for the topic of "advanced on-demand", and improve the classification accuracy and recall rate. And the analysis shows that the public opinion formation period of the public opinion event "I qiyi's advance on-demand celebration for more than a year was convicted of breaking the law" was short, and the initial stage of development reached the period of high public opinion at a linear speed. In addition, the proportion of negative speech in the high period is greater than the proportion of negative emotional speech in the overall data set. |
学位类型 | 硕士 |
答辩日期 | 2021-05-15 |
学位授予地点 | 甘肃省兰州市 |
语种 | 中文 |
论文总页数 | 55 |
参考文献总数 | 36 |
馆藏号 | 0003685 |
保密级别 | 内部 |
中图分类号 | C8/268 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/29601 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 杨丽娜. 基于LSTM的微博话题“超前点播”舆情分析[D]. 甘肃省兰州市. 兰州财经大学,2021. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
杨丽娜_基于LSTM的微博话题“超前点播(2637KB) | 学位论文 | 暂不开放 | CC BY-NC-SA | 请求全文 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[杨丽娜]的文章 |
百度学术 |
百度学术中相似的文章 |
[杨丽娜]的文章 |
必应学术 |
必应学术中相似的文章 |
[杨丽娜]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论