基于加权核范数矩阵填充的因果效应评价方法研究

作者	邢晓文
姓名汉语拼音	XingXiaowen
学号	2021000003082
培养单位	兰州财经大学
电话	18338753986
电子邮件	xingxw2021@163.com
入学年份	2021-9
学位类别	学术硕士
培养级别	硕士研究生
学科门类	理学
一级学科名称	统计学
学科方向	数理统计学
学科代码	0714Z3
第一导师姓名	牛成英
第一导师姓名汉语拼音	NiuChengyin
第一导师单位	兰州财经大学
第一导师职称	教授
题名	基于加权核范数矩阵填充的因果效应评价方法研究
英文题名	Research on Causal Effect Evaluation Based on Weighted Kernel Norm Matrix Completion
关键词	因果推断矩阵填充加权核范数随机森林倾向得分
外文关键词	Causal inference ; Matrix completion ; Weighted kernel norm ; Random forest ; Propensity score
摘要	随着大数据科学发展以及因果推断相关研究的不断推进，从观察数据中发现因果关系是各领域研究中的热点问题。在反事实因果效应估计框架下，潜在结果的估计是关键问题。目前，随着机器学习功能变得越来越强大，越来越多的机器学习算法融入因果推断方法中，利用矩阵填充方法填补缺失潜在结果已经被证实是因果推断中一种有效方法。从矩阵填充角度出发，研究在因果推断中如何更好的利用先验信息降低矩阵填充估计潜在结果的误差。主要研究内容包括以下几方面： (1)在不考虑协变量的情况下，填充目标变量的反事实结果。提出用自适应加权核范数替代核范数的加权核范数矩阵填充方法，避免了较大奇异值过度惩罚的情况，并根据交替方向乘子法(Alternating Direction Method of Multipliers，ADMM) 框架证明最优解以及收敛性。从数值模拟和真实数据实验结果来看，相较于传统方法加权核范数模型预测缺失潜在结果精度更高。 (2)在考虑协变量的情况下，提出基于随机森林倾向得分加权矩阵填充的方法，用随机森林模型估计倾向得分能有效避免传统倾向得分方法模型设定匹配结果的平衡性，在此基础上，利用倾向得分匹配协变量平衡性质加权损失函数，使得处理组和控制组之间具有可比性。从实验结果来看，随机森林倾向得分加权矩阵填充模型估计潜在结果的误差更小，并估计出因果推断关注的平均处理效应。
英文摘要	With the development of big data science and the continuous advancement of research on causal inference, discovering causal relationships from observational data is a hot topic in various fields of research. Under the framework of counterfactual causal effect estimation, the estimation of potential outcomes is a key issue. Currently, as machine learning capabilities become increasingly powerful, more and more machine learning algorithms are being integrated into causal inference methods. The use of matrix completion methods to complete in missing potential outcomes has been proven to be an effective method in causal inference. This article starts from the perspective of matrix completion and studies how to better utilize prior information in causal inference in causal inference to reduce the error of potential outcomes in matrix completion estimation. The main research content includes the following aspects: (1) Without considering covariates, complete the counterfactual results of the target variable. This article proposes a weighted kernel norm matrix completion method, which replaces the kernel norm with an adaptive weighted kernel norm to avoid excessive punishment for large singular values. The optimal solution and convergence are proven using the ADMM framework. From the results of numerical simulation and real data experiments, it can be seen that compared to traditional methods, weighted kernel norm models have higher accuracy in predicting potential missing outcomes. (2) Taking into account covariates, this article proposes a method based on random forest propensity score weighted matrix completion model. Estimating propensity scores using a random forest model can effectively avoid the balance of matching outcomes model settings in traditional propensity score methods. Based on this, weighted loss function using propensity score covariate balance property to ensure comparability between treatment group and control group. From the experimental results, it can be seen that the random forest propensity score weighted matrix completion model has a smaller error in estimating potential outcomes and estimates the average treatment effect for the treated in causal inference.
学位类型	硕士
答辩日期	2024-05-25
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	62
参考文献总数	72
馆藏号	0005683
保密级别	公开
中图分类号	O212/41
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/36991
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	邢晓文. 基于加权核范数矩阵填充的因果效应评价方法研究[D]. 甘肃省兰州市. 兰州财经大学,2024.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2021000003082.pdf（1473KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载