基于非负矩阵分解的多视角函数型聚类算法研究与应用

作者	程莞莞
姓名汉语拼音	Cheng Wanwan
学号	2021000003004
培养单位	兰州财经大学
电话	13243172540
电子邮件	chengwanwana@163.com
入学年份	2021-9
学位类别	专业硕士
培养级别	硕士研究生
一级学科名称	应用统计
学科代码	0252
授予学位	应用统计硕士专业学位
第一导师姓名	高海燕
第一导师姓名汉语拼音	Gao Haiyan
第一导师单位	兰州财经大学
第一导师职称	教授
题名	基于非负矩阵分解的多视角函数型聚类算法研究与应用
英文题名	Research and Application of Multi-view Functional Clustering Algorithm Based on Non-Negative Matrix Factorization
关键词	函数型聚类非负矩阵分解多视角学习鲁棒性双正交
外文关键词	Functional clustering ; Non-Negative Matrix Factorization ; Multi-view learning ; Robust ; Co-orthogonal
摘要	随着数据采集技术的进步，出现了具备无穷维和连续特征的函数型数据，由此展开了对函数型数据分析方法的探索，其中对函数型数据聚类分析方法的研究受到了广泛的关注。现有的多元函数型聚类方法大多采用先“融合”各一元函数型数据再进行聚类的策略，其难以挖掘各变量间的深层次信息，而机器学习领域中的多视角学习却有着出色的聚合性能，其聚类分析结果也更为全面。此外，非负矩阵分解由于其较强的可解释性及简单的模型求解方法，在聚类研究领域得到广泛应用。一些学者将多视角学习与非负矩阵分解结合起来，展开聚类研究。受此启发，本文在函数型数据分析的框架下，基于非负矩阵分解，将多视角学习与函数型聚类相结合，提出两种多视角函数型聚类算法，希望能够通过这两种算法有效地揭示函数型数据的内在结构和特征，为相关领域的研究带来新的启发与思考。文章的具体研究内容如下：（1）针对包含噪声和异常值的函数型数据，构建基于图正则化非负矩阵分解的鲁棒多视角函数型聚类算法。该算法利用l21范数，引入图正则化项，保持低秩数据矩阵的内在几何结构，提高算法性能。采用交替迭代方法对目标函数进行优化，给出模型的迭代更新求解算法及算法流程，证明了算法的收敛性并对其计算复杂度进行了探讨。在随机模拟数据集和Growth数据集上进行实验，表明该方法在提高聚类性能的同时具有鲁棒性。将其应用于对北京市空气质量监测站点的空间布局识别，结果表明该方法具有一定的现实意义。（2）针对函数型数据高维且体量大的特点，构建鲁棒双正交多视角函数型聚类算法。采用l21范数，引入图正则化，考虑数据的局部几何特征，集成多视角异构特征；同时对矩阵添加约束，利用表示矩阵和基矩阵的正交性提高算法的聚类性能。采用交替迭代方法对模型优化，给出算法流程，利用辅助函数法证明算法的收敛性。在随机模拟数据集、Growth数据集以及TIMIT语音数据集上的实验表明，该方法能够有效提高聚类性能。同时，针对甘肃省行政区划气象数据的实际应用表明该方法具有良好的适用性。
英文摘要	With the progress of data collection technology, functional data with infinite-dimensional and continuous characteristics have appeared. This led to the exploration of functional data analysis methods, among which the research on functional data clustering analysis methods has received widespread attention. Most of the existing multivariate functional clustering methods adopt the strategy of "fusing" each monadic functional data before clustering, which is difficult to mine the deep information among variables. However, multi-perspective learning in the field of machine learning has excellent aggregation performance and its cluster analysis results are more comprehensive. In addition, non-negative matrix factorization is widely used in the field of clustering because of its strong interpretability and simple model solving methods. Some scholars combine multi-view learning with non-negative matrix factorization to carry out clustering research. Inspired by this, in the framework of functional data analysis, this thesis uses non-negative matrix factorization to combine multi-view learning with functional clustering, and proposes two multi-view functional clustering algorithms, they are expected to reveal the internal structure and characteristics of functional data effectively, and bring new inspiration and thinking to the research in related fields. The specific research content of this thesis as follows: (1) A robust multi-view functional clustering algorithm based on graph regularized non-negative matrix decomposition is constructed for functional data with noise and outliers. This algorithm employed l21 norm and introduced a graph Laplacian regularization terms to maintain the intrinsic geometric structure of the data set and improve the performance of the algorithm. Initially, an alternating iteration method was used to optimize the objective function, providing an iterative updating solution algorithm and the algorithm flowchart. Subsequently, the convergence of the algorithm was proven, and its computational complexity was discussed. Experiments conducted on both randomly generated datasets and the Growth dataset demonstrated that this method improves clustering performance while exhibiting robustness. When applied to identify the spatial layout of air quality monitoring stations in Beijing, the results indicated that this method possesses certain practical significance. (2) Aiming at the high dimensionality and large volume of functional data, a robust co-orthogonal constraint multi-view functional clustering algorithm is devised. The algorithm adopts the l21 norm, the graph regularization is introduced, the local geometric characteristics of the data are considered, and the multi-view heterogeneous features are integrated. At the same time, constraints are added to the non-negative matrix, and the orthogonality of the representation matrix and the base matrix is used to improve the clustering performance of the algorithm. The alternating iterative method is used to optimize the model, the algorithm flow is given, and the auxiliary function method is used to prove the convergence of the algorithm. Experiments on the stochastic simulation dataset, the Growth dataset and the TIMIT speech dataset show that the proposed method can effectively improve the clustering performance. At the same time, the practical application of meteorological data for administrative divisions in Gansu Province shows that the method has good applicability.
学位类型	硕士
答辩日期	2024-05-25
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	70
参考文献总数	78
馆藏号	0005605
保密级别	公开
中图分类号	C8/381
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/36838
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	程莞莞. 基于非负矩阵分解的多视角函数型聚类算法研究与应用[D]. 甘肃省兰州市. 兰州财经大学,2024.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2021000003004.pdf（3294KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载