Institutional Repository of School of Statistics
作者 | 倪志恒 |
姓名汉语拼音 | nizhiheng |
学号 | 2018000003128 |
培养单位 | 兰州财经大学 |
电话 | 15856919375 |
电子邮件 | nzhsta@163.com |
入学年份 | 2018-9 |
学位类别 | 专业硕士 |
培养级别 | 硕士研究生 |
一级学科名称 | 应用统计 |
学科代码 | 0252 |
第一导师姓名 | 杨盛菁 |
第一导师姓名汉语拼音 | yangshengjing |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 基于文本挖掘的虚假评论识别 |
英文题名 | False comment recognition based on Text Mining |
关键词 | 虚假评论 文本挖掘 模型融合 神经网络 行为模式 |
外文关键词 | False comments ; Text mining ; Neural network ; Model fusion ; Behavior pattern |
摘要 | 近年来,随着新零售以及移动网络支付的高速发展,人们对于消费途径的多样性需求越来越旺盛,网上购物成为人们消费、购物的重要途径。与此同时,网上购物会产生很多的在线评论,这些评价对商家的销量产生很大的影响,因为互联网具有高度开放性的特点,有很多的商家开始关注消费者网络购物产生的网上评论信息,在利益的驱动下,出现了商家背后操纵评论的现象,网上开始出现虚假评论,严重损害了消费者的权益。因此,无论是从消费者个人的角度出发还是从商家、平台的角度出发,对虚假评论信息进行识别都是一项急需解决的工作。但是在线评论每日的增量都很巨大,利用人工对评论进行审核会耗费大量的人力物力,所以需要一套有效的识别方法对在线评论进行自动甄别,有效剔除虚假评论。 本文的目的是能够给出一套针对电商平台的虚假评论识别方法,能够快速、精准、有效地识别虚假评论,并总结出虚假评论的行为模式。本文对虚假评论识别的主要工作包括: 首先从电商平台获取相关的评论数据,对文本数据进行清洗,将分好词的文本数据进行合适的向量化操作,并通过评论发布时间、重复评论、评论者情感倾向等对数据进行标注。 其次,将向量化后的文本采用传统的机器学习和深度学习等方法进行基分类器的训练,并通过集成学习方法将各个基分类器进行组合,对文本评论数据进行识别。 利用训练好的模型对大量的未标注评论数据进行预测,通过虚假评论的语言模型、分析主题差异、进行词性分析、情感分析等工作挖掘虚假评论的行为属性及其行为模式。实验结果表明:本方法具有良好的虚假评论识别的性能,这为消费者和监管部门提供了新的方法,具体一定的实际应用价值。 |
英文摘要 | In recent years, with the rapid development of new retail and mobile network payment, people's demand for the diversity of consumption channels is more and more vigorous, and online shopping has become an important way for people to consume and shop. At the same time, online shopping is producing a lot of online comments, the evaluation of a large impact on sales to the businessman, because the Internet has the characteristics of highly open, there are a lot of businesses begin to pay close attention to the consumer online shopping, online comments information generated under the drive of interests, the business behind the phenomenon of comments, began to appear false comments online, serious harm the rights of consumers. Therefore, whether from the perspective of individual consumers or from the perspective of businesses and platforms, it is an urgent task to identify fake review information. However, the daily increment of online comments is huge, and it will cost a lot of manpower and material resources to review comments manually. Therefore, a set of effective identification method is needed to automatically screen online comments and effectively eliminate false comments. The purpose of this thesis is to provide a set of identification methods for fake comments on e-commerce platforms, to identify fake comments quickly, accurately and effectively, and to summarize the behavior patterns of fake comments. This thesis mainly realizes the identification of false comments based on the method of text mining. The main work includes: Firstly, the relevant comment data were obtained from the e-commerce platform, and the text data were cleaned. The text data with good words were subjected to appropriate vectorization operation, and the data were annotated by comment release time, repeated comments, and the emotional tendency of the commenters. Secondly, the vectorized text is trained with the traditional machine learning and deep learning methods, and each base classifier is combined with the ensemble learning method to recognize the text comment data. The trained model is used to predict the data of a large number of unlabeled comments, and the behavior attributes and behavior patterns of fake comments are mined through language model of fake comments, topic difference analysis, part of speech analysis, sentiment analysis and other work. The experimental results show that this method has a good performance in the identification of false comments, which provides a new method for consumers and regulators, and has a certain practical application value. |
学位类型 | 硕士 |
答辩日期 | 2021-05-15 |
学位授予地点 | 甘肃省兰州市 |
语种 | 中文 |
论文总页数 | 67 |
参考文献总数 | 54 |
馆藏号 | 0003679 |
保密级别 | 公开 |
中图分类号 | C8/262 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/29623 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 倪志恒. 基于文本挖掘的虚假评论识别[D]. 甘肃省兰州市. 兰州财经大学,2021. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
20180000003128-倪志恒-基(1888KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[倪志恒]的文章 |
百度学术 |
百度学术中相似的文章 |
[倪志恒]的文章 |
必应学术 |
必应学术中相似的文章 |
[倪志恒]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论