作者马文娟
姓名汉语拼音MaWenjuan
学号2021000003077
培养单位兰州财经大学
电话18409487231
电子邮件mawenjuan0503@163.com
入学年份2021-9
学位类别学术硕士
培养级别硕士研究生
学科门类理学
一级学科名称统计学
学科方向数理统计学
学科代码0714Z3
授予学位理学硕士学位
第一导师姓名高海燕
第一导师姓名汉语拼音GaoHaiyan
第一导师单位兰州财经大学
第一导师职称教授
题名基于非负矩阵分解的函数型矩阵填充方法研究与应用
英文题名Research and Application of Functional Matrix Completion Method based on Non negative Matrix Factorization
关键词函数型数据分析 矩阵填充 非负矩阵分解 多视角学习 缺失插补
外文关键词Functional data analysis ; Matrix completion ; Non negative matrix factorization ; Multi-view learning ; Missing imputation
摘要

数据重构是从部分已观测数据恢复原始数据的过程,是缺失处理的核心任务。 数据重构的关键在于充分、合理地利用数据的特性。目前,随着函数型数据分析 的发展,函数型矩阵填充方法成为数据重构的主流方法之一。然而,现有的函数 型矩阵填充方法在数据学习过程中未充分利用数据的潜在特征,如样本曲线的相 关性、样本的高阶邻域信息等。为了解决这些问题,本文在函数型数据分析框架 下,通过引入非负约束,借助非负矩阵分解(Non-negative Matrix Factorization, NMF)和类信息,提出融合类信息的函数型矩阵填充方法(Non-negative Functional Matrix Completion Method with Class Information,CNFMC)。同时,利用样本信 息的高阶邻域关系以及各视角之间的多样性估计缺失数据,提出基于图正则化的 多视角函数型矩阵填充方法(Based on Graph Regularization Multi-view Non negative Functional Matrix Completion,GMVNFMC)。本文主要研究内容包括以 下两部分: (1) 提出一种融合类信息的函数型矩阵填充方法(CNFMC)。基于非负矩阵分 解构造函数型矩阵填充方法,在此基础上通过聚类划分引入样本类信息,借助类 内样本相关性插补缺失值,并采用自加权集成学习算法动态赋权重计算得最终插 补值。在公共交通数据集PeMS进行了缺失模拟插补实验,并针对空气质量缺失 数据进行实证应用分析。结果表明:相较于K近邻算法、MICE、PACE等10种 插补方法,CNFMC方法插补精度高、鲁棒性好、适用性较强,且耗时可控,能 够保证插补的有效性和准确性。 (2) 提出一种基于图正则化的多视角函数型矩阵填充方法(GMVNFMC)。通 过引入最优图正则化,充分考虑了各视角内样本信息的高阶邻域关系,减少了信 息损失;同时,利用希尔伯特-施密特独立性准则探索不同视角之间包含的互 补信息,进而提高插补精度。分别对空气污染物数据集进行模拟插补实验和实证 应用,结果表明,相较于其他主流插补方法,GMVNFMC 方法具有更好的插补 效果。

英文摘要

Data reconstruction is the process of restoring original data from partially observed data, and is the core task of missing data handling. The key to data reconstruction lies in fully and reasonably utilizing the characteristics of data. Currently, with the development of functional data analysis, functional matrix completion method has become one of the mainstream methods for data reconstruction. However, existing functional matrix completion methods do not fully utilize the potential features of data in the data learning process, such as the correlation of sample curves and high-order neighborhood information of samples. To address these issues, this thesis proposes a non-negative functional matrix completion method with class information (CNFMC) that by introducing non-negative constraints and utilizing non-negative matrix factorization (NMF) and class information within the framework of functional data analysis. Meanwhile, utilizing the high-order neighborhood relationship of sample information and the diversity between different views to estimate missing data, a multi-view non-negative functional matrix completion (GMVNFMC) method based on graph regularization is proposed. The main research content of this thesis includes the following two parts:

(1) Propose a non-negative functional matrix completion method with class information (CNFMC). Based on the non-negative matrix factorization constructor functional matrix completion method, sample class information is introduced through clustering partitioning, missing values are imputed using intra class sample correlation, and the final imputation value is calculated by dynamically assigning weights using a self-weighted ensemble learning algorithm. We conducted missing simulation imputation experiments on the public transportation dataset PeMS and conducted empirical application analysis on air quality missing data. The results show that compared to 10 interpolation methods such as K-nearest neighbor algorithm, MICE, PACE, etc., the CNFMC method has high interpolation accuracy, good robustness, strong applicability, and controllable time consumption, which can ensure the effectiveness and accuracy of imputation. (2) Propose a multi-view functional matrix completion method based on graph regularization (GMVNFMC). By introducing optimal graph regularization, the high-order neighborhood relationships of sample information within each view are fully considered, reducing information loss; Meanwhile, utilizing the Hilbert-Schmidt independence criterion to explore the complementary information contained between different views,thereby improving interpolation accuracy. Simulated interpolation experiments and empirical applications were conducted on the air pollutant datasets, and the results showed that the GMVNFMC method has better imputation performance compared to other mainstream imputation methods.

学位类型硕士
答辩日期2024-05-25
学位授予地点甘肃省兰州市
语种中文
论文总页数83
参考文献总数71
馆藏号0005678
保密级别公开
中图分类号O212/36
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/36859
专题统计与数据科学学院
推荐引用方式
GB/T 7714
马文娟. 基于非负矩阵分解的函数型矩阵填充方法研究与应用[D]. 甘肃省兰州市. 兰州财经大学,2024.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
2021000003077.pdf(10094KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[马文娟]的文章
百度学术
百度学术中相似的文章
[马文娟]的文章
必应学术
必应学术中相似的文章
[马文娟]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 2021000003077.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。