作者王文瑞
姓名汉语拼音Wangwenrui
学号2019000010010
培养单位兰州财经大学
电话18794809280
电子邮件18794809280@163.com
入学年份2019-9
学位类别学术硕士
培养级别硕士研究生
学科门类管理学
一级学科名称管理科学与工程
学科方向
学科代码1201
授予学位管理学硕士学位
第一导师姓名李强
第一导师姓名汉语拼音liqiang
第一导师单位兰州财经大学
第一导师职称教授
题名融合语法信息的交互式方面级情感分析研究
英文题名Aspect-Level Sentiment Analysis Based on Syntactic Information of Interactive Attention
关键词BERT预训练语言模型 语法信息 方面词提取 交互注意力
外文关键词BERT pre-trained language model ; Grammatical information ; Aspectual word extraction ; Interactive attention
摘要

随着互联网在各年龄段人群的快速普及,越来越多的人加入到了社交平台和电商购物行列,人们在网络上交流的同时留下了大量的观点和评论,这些带有主观情感态度的网络文本对于消费者而言可以作为消费决策的依据,因此文本情感分析应运而生,方面级情感分析以判断评价实体所对应的情感极性为主要任务,根据这些情感极性判断产品的满意程度,从而为其他潜在消费者提供参考价值,网络上大量的数据为深度学习的应用提供了长足的动力,也为情感极性的判断开辟了新思路。

现存在的研究方法是在方面级情感分类过程中使用注意力机制,给文本特征分配权重,来提取语义层面的信息,忽略了语法信息的使用,导致不能很好地利用不同方面词的意见来进行情感分类,而且以往的研究对上下文和方面实体没有进行单独建模,当文本中有多个方面实体时,不同方面实体与所对应的情感极性无法做到很好地匹配,因此,针对以上问题,本文提出了基于语法信息的交互式情感分析模型(Aspect-Level Sentiment Analysis Based on Syntactic Information of Interactive Attention,以下简称为SICA模型)。模型主要包括两大部分内容:一部分是文本上下文特征的提取;另一部分是基于特定方面词特征的提取;最后将提取的上下文文本特征与特定方面词特征进行拼接输入交互注意力层进行交互,得到情感分类结果。上下文文本特征提取部分主要包括词嵌入层、语法信息提取层、卷积层和双向长短期记忆神经网络层,本文采用BERT预训练语言模型作为上下文信息和方面词的词向量模型,语法信息层是以BERT模型的中低层作为主要信息来源,以带有依赖关系的依存句法树作为辅助信息,构成语法信息层,卷积层用来获取全局的语义信息,在情感分析任务中,尤其在中文领域容易出现一词多义的情况,双向长短期记忆神经网络含有丰富的语义信息,能够有效地捕捉到一词多义的情况,而单向的长短期记忆神经网络无法处理。特定方面词的特征提取部分,利用依存句法树作为辅助提取方面词,不同方面的方面词不能很好地利用,用卷积神经网络提取特征。交互注意力模块同时设定阈值,交互权重过小时,可进行二次交互,能够全面地利用文本信息,最后,通过情感输出层输出结果,以此判断文本的情感极性。

本文所提出的 SICA 模型的主要创新点有以下几点:

第一,利用 BERT 模型的中低层短语结构,结合标明依赖关系的依存句法树引入句法结构组成模型的语法信息层,作为整个语法信息的抽取层;

第二,以往的研究没有将方面词与上下文信息分开单独建模处理,本文将抽

取基于特定的方面词序列特征,以依存句法树作为辅助信息进行提取;

第三,设置交互机制,将基于特定方面词序列与上下文序列进行交互,设置

打分函数,二次计算交互权重,全面提取相关的词语信息。

将本文提出的SICA模型在3个公开数据集RestaurantLaptopTwitter进行实验验证,通过对不同模型之间的性能对比和该模型在上述3个数据集上的准确率和F1值证明了提出的SICA模型的有效性,证明语法信息在提高模型的性能方面有明显贡献。

英文摘要

With the rapid popularization of Internet in all ages, more and more people join the social platform and electric business shopping, people in the network of communication left a lot of views and comments at the same time, the network text with subjective emotional attitude to consumers can be used as the basis for the consumer decision-making, so text sentiment analysis arises at the historic moment, Emotional level analysis to determine evaluation entity of the emotional polarity as the main task, according to these emotional polarity judgment's satisfaction with the products, so as to provide reference value for other potential customers, a large amount of data on network provides a great incentive for the application of deep learning, as well as emotional polarity judgment opened up a new way of thinking.

The existing research approach uses attention mechanism in the aspect-level sentiment classification process, which assigns weights to text features to extract semantic-level information and ignores the use of syntactic information, resulting in the inability to make good use of the opinions of different aspect words for sentiment classification, and previous studies do not model context and aspect entities separately, and when there are multiple aspect entities in the text, the different aspect entities cannot be well matched with the corresponding sentiment polarity. Therefore, to address the above problems, this paper proposes the Aspect-Level Sentiment Analysis Based on Syntactic Information of Interactive Attention (hereinafter referred to as SICA model). The model mainly consists of two major parts: one part is the extraction of text contextual features; one part is the extraction based on aspect-specific word features; finally, the extracted contextual text features and aspect-specific word features are stitched together and input into the interactive attention layer for interaction, and the sentiment classification results are obtained. Contextual text feature extraction part mainly includes word embedding layer, grammatical information extraction layer, convolutional layer and two-way long and short-term memory neural network layer, this paper adopts BERT pre-trained language model as the word vector model of contextual information and aspect words, grammatical information layer is the middle and low layers of BERT model as the main information source, and the dependent syntactic tree with dependencies as the auxiliary information, which constitutes the grammatical information layer, and the convolutional layer is used to obtain global semantic information. In sentiment analysis tasks, especially in the Chinese domain, where multiple meanings of words are likely to occur, the bidirectional long- and short-term memory neural network contains rich semantic information and can effectively capture multiple meanings of words, which cannot be handled by the unidirectional long- and short-term memory neural network. The feature extraction part of specific aspect words, using the dependent syntax tree as an aid to extract aspect words, different aspects of aspect words cannot be well utilized, and the features are extracted by convolutional neural network. The interaction attention module sets the threshold value at the same time, and the interaction weight is too small for secondary interaction to be able to fully utilize the text information, and finally, the results are output through the emotion output layer, which is used to judge the emotional polarity of the text.

The main innovations of the SICA model proposed in this paper are as follows:

Firstly, the syntax information layer of the BERT model is used as the extraction layer of the whole syntax information by using the middle and lower level phrase structure of the BERT model and the dependency syntaxtree which indicates the dependency relationship.

Second, previous studies did not model aspect words and context information separately. In this paper, features based on specific aspect word sequences will be extracted with dependency syntax tree as auxiliary information.

Third, set the interaction mechanism, based on the specific aspect word sequence and context sequence interaction, set the scoring function, secondary calculation of interaction weight, comprehensive extraction of relevant word information.

The SICA model proposed in this paper was verified experimentally in three public data sets, Restaurant, Laptop and Twitter. The effectiveness of the proposed SICA model was proved by comparing the performance of different models and the accuracy and F1 values of the model in the above three data sets. It is proved that syntactic information contributes significantly to improving the performance of the model.

 

学位类型硕士
答辩日期2022-05-29
学位授予地点甘肃省兰州市
学位专业管理科学与工程
学科领域管理科学与工程(可授管理学、工学学位)
研究方向信息管理与信息系统
语种中文
论文总页数66
参考文献总数60
馆藏号0004263
保密级别公开
中图分类号C93/69
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/32301
专题信息工程与人工智能学院
推荐引用方式
GB/T 7714
王文瑞. 融合语法信息的交互式方面级情感分析研究[D]. 甘肃省兰州市. 兰州财经大学,2022.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2019000010010.pdf(2034KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[王文瑞]的文章
百度学术
百度学术中相似的文章
[王文瑞]的文章
必应学术
必应学术中相似的文章
[王文瑞]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2019000010010.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。