作者赵芳芳
姓名汉语拼音ZhaoFangfang
学号2021000003047
培养单位兰州财经大学
电话15682912567
电子邮件15682912567@163.com
入学年份2021-9
学位类别专业硕士
培养级别硕士研究生
一级学科名称应用统计
学科代码0252
第一导师姓名高海燕
第一导师姓名汉语拼音GaoHaiyan
第一导师单位兰州财经大学
第一导师职称教授
题名基于非负矩阵分解的函数型聚类算法研究及应用
英文题名Research and Application of Functional Clustering Algorithm Based on Non-Negative Matrix Factorization
关键词函数型数据 聚类分析 非负矩阵分解 鲁棒性 缺失值 多视角学习
外文关键词Functional Data ; Clustering ; Non-negative Matrix Factorization ; Robustness ; Missing Data ; Multi-view Learning
摘要

  数据采集频率增加导致出现了具有“函数”性质的数据,函数型数据分析(Functional Data AnalysisFDA)应运而生,其中函数型聚类分析(Functional Clustering)成为探索函数型数据的重要工具。目前,对于函数型聚类分析的研究大多采用“曲线拟合+聚类”的两步法实现,存在提取最优类别信息效果不佳、计算成本高等问题,这会影响聚类效果的准确性和效率。为了解决这些问题,本文在非负矩阵分解(Non-negative Matrix FactorizationNMF)的框架下,采用函数型数据分析方法,侧重研究基于非负矩阵分解的函数型聚类一步法,主要研究内容包括以下三部分:

  (1)为了有效利用数据的非线性和低维流形结构,提出了基于双随机图正则化矩阵分解的函数型聚类算法(Functional Clustering Algorithm Based on Bi-stochastic Graph Regularized Matrix FactorizationBSMFFC)。引入图正则化技术,构造最近邻图来模拟流形结构,结合双随机矩阵动态更新图,从而充分利用了数据的固有几何结构信息。同时,给出了模型的优化求解算法,并计算了时间复杂度。模拟实验结果验证该算法可行性,实例应用验证了该算法的实用性。

  (2)针对含有噪声和异常值的情况,提出了基于鲁棒图正则化矩阵分解的函数型聚类算法(Functional Robust Manifold Nonnegative Matrix FactorizationFRMNMF)。利用l2,1范数来定义损失函数,从而减弱数据中噪声或异常值的影响,并利用流形学习,保证了数据的局部不变性。给出更新算法,并证明了算法收敛性和计算复杂度。在合成数据和真实数据上的实验结果表明,该算法在函数型聚类任务中具有一定的鲁棒性。以城镇居民人均可支配收入数据应用为例,其聚类结果表明该算法的可行性、合理性及实际应用价值。

  (3)针对多视角函数型数据出现缺失的情况,提出一种自加权不完整多视角函数型聚类算法(Adaptive Incomplete Multi-view Clustering for Functional datasetAIMFC)。将多视角学习、非负矩阵分解以及矩阵填充进行融合,采用自加权方法为每个视角分配相应的权重。给出算法更新公式,借助模拟实验,验证了该算法的可行性。此外,对于针对北京市空气污染物小时浓度数据所得出的聚类结果而言,改进后的算法在缺失数据聚类问题上表现出了优异的效果。

英文摘要

  The increasing frequency of Data collection has led to the emergence of "Functional" data, and Functional Data Analysis (FDA) has emerged, in which Functional Clustering has become a crucial instrument for exploring functional data. Currently, most researches on functional cluster analysis adopt the two-step method of "curve fitting + clustering", which has problems such as poor effect of extracting optimal category information and high computing cost, which will affect the accuracy and efficiency of clustering effect. In order to solve these problems, this paper adopts functional data analysis method under the framework of Non-negative Matrix Factorization (NMF), focusing on the study of functional clustering one-step method based on non-negative matrix factorization. The main research contents include the following three parts:

  (1) In order to effectively utilize the nonlinear and low-dimensional manifold structure of data, a functional clustering algorithm based on the regularization matrix decomposition of double random graphs is proposed. (Functional Clustering Algorithm Based on Bi-stochastic Graph Regularized Matrix FactorizationBSMFFC). By introducing graph regularization technique, the nearest neighbor graph is constructed to simulate manifold structure, and the graph is dynamically updated with doubly stochastic matrix, thus making full use of the inherent geometric structure information of data. At the same time, the optimization algorithm of the model is given, and the time complexity is calculated. The simulation results verify the feasibility of the algorithm, and the practical application of the algorithm is verified.

  (2) For the cases containing noise and outliers, a Functional Robust Manifold Nonnegative Matrix Factorization (FRMNMF) algorithm is proposed based on the robust graph regularization matrix factorization. The l2,1-norm is used to define the loss function to reduce the influence of noise or outliers in the data, and the manifold learning is used to ensure the local invariance of the data. The updated algorithm is given, and its convergence and computational complexity are proved. Experimental results on synthetic data and real data show that the proposed algorithm is robust in functional clustering tasks. Taking the per capita disposable income data of urban residents as an example, the clustering results show the feasibility, rationality and practical application value of the algorithm.

  (3) Aiming at the absence of Multi-view Functional data, an Adaptive Incomplete multi-view Clustering for Functional dataset (AIMFC) is proposed. Multi-perspective learning, non-negative matrix decomposition and matrix filling are integrated, and the self-weighting method is used to assign the corresponding weight to each perspective. The algorithm updating formula is given, and the feasibility of the algorithm is verified by simulation experiment. In addition, for the clustering results obtained from the hourly concentration data of air pollutants in Beijing, the improved algorithm shows excellent results in the missing data clustering problem.

学位类型硕士
答辩日期2024-05-25
学位授予地点甘肃省兰州市
语种中文
论文总页数79
参考文献总数79
馆藏号0005648
保密级别公开
中图分类号C8/424
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/36716
专题统计与数据科学学院
推荐引用方式
GB/T 7714
赵芳芳. 基于非负矩阵分解的函数型聚类算法研究及应用[D]. 甘肃省兰州市. 兰州财经大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2021000003047.pdf(9546KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[赵芳芳]的文章
百度学术
百度学术中相似的文章
[赵芳芳]的文章
必应学术
必应学术中相似的文章
[赵芳芳]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。