作者刘万金
姓名汉语拼音Liu Wanjin
学号2020000003011
培养单位兰州财经大学
电话18894312889
电子邮件1639943906@qq.com
入学年份2020-9
学位类别学术硕士
培养级别硕士研究生
学科门类理学
一级学科名称统计学
学科方向统计学
学科代码0714Z3
第一导师姓名高海燕
第一导师姓名汉语拼音Gao Haiyan
第一导师单位兰州财经大学
第一导师职称教授
题名基于对称非负矩阵分解的鲁棒聚类算法研究
英文题名Research on Robust Clustering Algorithm Based on Symmetric Nonnegative Matrix Factorization
关键词对称非负矩阵分解 鲁棒性 聚类算法
外文关键词Symmetric nonnegative matrix factorization; Robustness; Cluster algorithm
摘要

对称非负矩阵分解SNMF(Symmetric Nonnegative Matrix Factorization)作为一种基于图的聚类算法,能够更自然地捕获图表示中嵌入的聚类结构,并且在线性和非线性数据上获得更好的聚类结果,但对变量的初始化比较敏感。另外,标准的SNMF算法利用误差平方和衡量分解的质量,对噪声和异常值敏感。为了解决这些问题,在集成学习视角下,提出一种鲁棒自适应对称非负矩阵分解聚类算法RS3NMF (Robust Self-adaptived Symmetric Nonnegative Matrix Factorization)。进一步,结合训练集的标签信息增强投影矩阵的判别能力,将鲁棒性、自适应学习和标签信息集成到SNMF框架中,提出一种鲁棒自适应学习判别对称非负矩阵分解算法(Robust Adaptive Learning Discriminative Symmetric Nonnegative Matrix Factorization AlgorithmRADS3NMF)。本文主要研究内容包括以下两部分:

(1) 受鲁棒非负矩阵分解、自适应方法和集成学习的启发,建立鲁棒自适应对称非负矩阵分解聚类算法(RS3NMF),该算法将鲁棒性融入SNMF框架。基于范数的RS3NMF模型缓解了噪声和异常值的影响,保持了特征旋转不变性,提高了模型的鲁棒性。同时,在不借助任何附加信息的前提下,利用SNMF对初始化特征的敏感性逐步增强聚类性能。采用交替迭代方法优化,并保证目标函数值的收敛性。大量实验结果显示,所提RS3NMF算法优于其它先进的算法,具有较强的鲁棒性。此外,对我国31省市GDP数据进行实例应用,结果表明该鲁棒聚类算法对GDP数据的划分能够判断各省之间的发展差异,具有良好的实际应用价值。

(2) 受空间聚类自表述学习方法的启发,通过引入范数、自适应学习和标签信息,建立鲁棒自适应学习判别对称非负矩阵分解算法(RADS3NMF)。具体地,首先由获得的自表示系数表示亲和矩阵,并利用训练集的标签信息增强投影矩阵的判别能力;其次对建立的模型进行优化求解,构造辅助函数,证明模型的收敛性,以及给出模型的算法复杂度;最后利用某一时间段北京市二氧化氮(NO2)污染物小时浓度数据,将该算法应用于北京市空气质量监测站点布设聚类分析,结果显示RADS3NMF算法能够较好地识别空气质量监测站点的空间布局,具有良好的适用性。

英文摘要

As a graph-based clustering algorithm, symmetric nonnegative matrix factorization (SNMF) can capture the clustering structure embedded in graph representation more naturally, and get better clustering results on linear and nonlinear data, but it is sensitive to the initialization of variables. In addition, the standard SNMF algorithm uses the sum of squares of errors to measure the quality of decomposition, which is sensitive to noise and outliers. In order to solve these problems, a robust adaptive symmetric nonnegative matrix factorization clustering algorithm (RS3NMF) is proposed from the perspective of ensemble learning. Furthermore, the discriminant ability of projection matrix is enhanced by combining the label information of training set, and a robust adaptive learning discriminant symmetric nonnegative matrix decomposition algorithm (RADS3NMF) is proposed by integrating robustnessadaptive graph learning and label information into SNMF framework. The main research contents of this paper include the following two parts:

Inspired by robust nonnegative matrix factorization, adaptive methods and ensemble learning, a robust adaptive symmetric nonnegative matrix factorization clustering algorithm (RS3NMF) is constructed, which integrates robustness into the SNMF framework. The  norm-based RS3NMF model alleviates the influence of noise and outliers, maintains the invariance of feature rotation and improves the robustness of the model. At the same time, without any additional information, the clustering performance is gradually enhanced by using the sensitivity of SNMF to initialization features. The alternating iteration method is used to optimize and ensure the convergence of the objective function value. A large number of experimental results show that the proposed RS3NMF algorithm is superior to other advanced algorithms and has strong robustness. In addition, the application of GDP data of 31 provinces and cities in China shows that the robust clustering algorithm can judge the development differences among provinces and has good practical application value.

Inspired by the spatial clustering self-expression learning method, a robust adaptive learning discriminant symmetric nonnegative matrix factorization algorithm (RADS3NMF) is constructed by introducing  normadaptive learning and label information. Specifically, firstly, the affinity matrix is represented by the obtained self-representation coefficient, and the discrimination ability of the projection matrix is enhanced by using the label information of the training set; Secondly, the model is optimized, the auxiliary function is constructed, the convergence of the model is proved, and the algorithm complexity of the model is given. Finally, using the hourly concentration data of nitrogen dioxide (NO2) pollutants in Beijing in a certain period of time, the algorithm is applied to the cluster analysis of air quality monitoring stations in Beijing. The results show that RADS3NMF algorithm can better identify the spatial layout of air quality monitoring stations and has good applicability.

学位类型硕士
答辩日期2023-05
学位授予地点甘肃省兰州市
语种中文
论文总页数59
参考文献总数45
馆藏号0004819
保密级别公开
中图分类号O212/29
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/34335
专题统计与数据科学学院
推荐引用方式
GB/T 7714
刘万金. 基于对称非负矩阵分解的鲁棒聚类算法研究[D]. 甘肃省兰州市. 兰州财经大学,2023.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
基于对称非负矩阵分解的鲁棒聚类算法研究.(6061KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[刘万金]的文章
百度学术
百度学术中相似的文章
[刘万金]的文章
必应学术
必应学术中相似的文章
[刘万金]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 基于对称非负矩阵分解的鲁棒聚类算法研究.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。