Institutional Repository of School of Statistics
作者 | 刘万金 |
姓名汉语拼音 | Liu Wanjin |
学号 | 2020000003011 |
培养单位 | 兰州财经大学 |
电话 | 18894312889 |
电子邮件 | 1639943906@qq.com |
入学年份 | 2020-9 |
学位类别 | 学术硕士 |
培养级别 | 硕士研究生 |
学科门类 | 理学 |
一级学科名称 | 统计学 |
学科方向 | 统计学 |
学科代码 | 0714Z3 |
第一导师姓名 | 高海燕 |
第一导师姓名汉语拼音 | Gao Haiyan |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 基于对称非负矩阵分解的鲁棒聚类算法研究 |
英文题名 | Research on Robust Clustering Algorithm Based on Symmetric Nonnegative Matrix Factorization |
关键词 | 对称非负矩阵分解 鲁棒性 聚类算法 |
外文关键词 | Symmetric nonnegative matrix factorization; Robustness; Cluster algorithm |
摘要 | 对称非负矩阵分解SNMF(Symmetric Nonnegative Matrix Factorization)作为一种基于图的聚类算法,能够更自然地捕获图表示中嵌入的聚类结构,并且在线性和非线性数据上获得更好的聚类结果,但对变量的初始化比较敏感。另外,标准的SNMF算法利用误差平方和衡量分解的质量,对噪声和异常值敏感。为了解决这些问题,在集成学习视角下,提出一种鲁棒自适应对称非负矩阵分解聚类算法RS3NMF (Robust Self-adaptived Symmetric Nonnegative Matrix Factorization)。进一步,结合训练集的标签信息增强投影矩阵的判别能力,将鲁棒性、自适应学习和标签信息集成到SNMF框架中,提出一种鲁棒自适应学习判别对称非负矩阵分解算法(Robust Adaptive Learning Discriminative Symmetric Nonnegative Matrix Factorization Algorithm,RADS3NMF)。本文主要研究内容包括以下两部分: (1) 受鲁棒非负矩阵分解、自适应方法和集成学习的启发,建立鲁棒自适应对称非负矩阵分解聚类算法(RS3NMF),该算法将鲁棒性融入SNMF框架。基于范数的RS3NMF模型缓解了噪声和异常值的影响,保持了特征旋转不变性,提高了模型的鲁棒性。同时,在不借助任何附加信息的前提下,利用SNMF对初始化特征的敏感性逐步增强聚类性能。采用交替迭代方法优化,并保证目标函数值的收敛性。大量实验结果显示,所提RS3NMF算法优于其它先进的算法,具有较强的鲁棒性。此外,对我国31个省市GDP数据进行实例应用,结果表明该鲁棒聚类算法对GDP数据的划分能够判断各省之间的发展差异,具有良好的实际应用价值。 (2) 受空间聚类自表述学习方法的启发,通过引入范数、自适应学习和标签信息,建立鲁棒自适应学习判别对称非负矩阵分解算法(RADS3NMF)。具体地,首先由获得的自表示系数表示亲和矩阵,并利用训练集的标签信息增强投影矩阵的判别能力;其次对建立的模型进行优化求解,构造辅助函数,证明模型的收敛性,以及给出模型的算法复杂度;最后利用某一时间段北京市二氧化氮(NO2)污染物小时浓度数据,将该算法应用于北京市空气质量监测站点布设聚类分析,结果显示RADS3NMF算法能够较好地识别空气质量监测站点的空间布局,具有良好的适用性。 |
英文摘要 | As a graph-based clustering algorithm, symmetric nonnegative matrix factorization (SNMF) can capture the clustering structure embedded in graph representation more naturally, and get better clustering results on linear and nonlinear data, but it is sensitive to the initialization of variables. In addition, the standard SNMF algorithm uses the sum of squares of errors to measure the quality of decomposition, which is sensitive to noise and outliers. In order to solve these problems, a robust adaptive symmetric nonnegative matrix factorization clustering algorithm (RS3NMF) is proposed from the perspective of ensemble learning. Furthermore, the discriminant ability of projection matrix is enhanced by combining the label information of training set, and a robust adaptive learning discriminant symmetric nonnegative matrix decomposition algorithm (RADS3NMF) is proposed by integrating robustness、adaptive graph learning and label information into SNMF framework. The main research contents of this paper include the following two parts: Inspired by robust nonnegative matrix factorization, adaptive methods and ensemble learning, a robust adaptive symmetric nonnegative matrix factorization clustering algorithm (RS3NMF) is constructed, which integrates robustness into the SNMF framework. The norm-based RS3NMF model alleviates the influence of noise and outliers, maintains the invariance of feature rotation and improves the robustness of the model. At the same time, without any additional information, the clustering performance is gradually enhanced by using the sensitivity of SNMF to initialization features. The alternating iteration method is used to optimize and ensure the convergence of the objective function value. A large number of experimental results show that the proposed RS3NMF algorithm is superior to other advanced algorithms and has strong robustness. In addition, the application of GDP data of 31 provinces and cities in China shows that the robust clustering algorithm can judge the development differences among provinces and has good practical application value. Inspired by the spatial clustering self-expression learning method, a robust adaptive learning discriminant symmetric nonnegative matrix factorization algorithm (RADS3NMF) is constructed by introducing norm、adaptive learning and label information. Specifically, firstly, the affinity matrix is represented by the obtained self-representation coefficient, and the discrimination ability of the projection matrix is enhanced by using the label information of the training set; Secondly, the model is optimized, the auxiliary function is constructed, the convergence of the model is proved, and the algorithm complexity of the model is given. Finally, using the hourly concentration data of nitrogen dioxide (NO2) pollutants in Beijing in a certain period of time, the algorithm is applied to the cluster analysis of air quality monitoring stations in Beijing. The results show that RADS3NMF algorithm can better identify the spatial layout of air quality monitoring stations and has good applicability. |
学位类型 | 硕士 |
答辩日期 | 2023-05 |
学位授予地点 | 甘肃省兰州市 |
语种 | 中文 |
论文总页数 | 59 |
参考文献总数 | 45 |
馆藏号 | 0004819 |
保密级别 | 公开 |
中图分类号 | O212/29 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/34335 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 刘万金. 基于对称非负矩阵分解的鲁棒聚类算法研究[D]. 甘肃省兰州市. 兰州财经大学,2023. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
基于对称非负矩阵分解的鲁棒聚类算法研究.(6061KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[刘万金]的文章 |
百度学术 |
百度学术中相似的文章 |
[刘万金]的文章 |
必应学术 |
必应学术中相似的文章 |
[刘万金]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论