Institutional Repository of School of Statistics
作者 | 吴桐 |
姓名汉语拼音 | Wu Tong |
学号 | 2021000003036 |
培养单位 | 兰州财经大学 |
电话 | 18916119980 |
电子邮件 | 920698959@qq.com |
入学年份 | 2021-9 |
学位类别 | 专业硕士 |
培养级别 | 硕士研究生 |
一级学科名称 | 应用统计 |
学科代码 | 0252 |
授予学位 | 经济学硕士 |
第一导师姓名 | 牛成英 |
第一导师姓名汉语拼音 | Niu Chengying |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 基于LASSO的PSM变量选择与应用 |
英文题名 | PSM Variable Selection And Application Based on LASSO |
关键词 | LASSO 倾向得分匹配(PSM) LASSO-PSM 变量选择 工作类型 |
外文关键词 | LASSO ; Propensity Score Matching(PSM) ; LASSO-PSM ; Variable Selection ; Job Type |
摘要 | 随机化实验是反事实框架下因果效应分析的黄金标准,但基于观察数据的实证研究中,由于各种原因,使得研究的样本单元无法满足随机化分配要求。倾向得分匹配(Propensity Score Matching,PSM)是一种将研究数据处理成“随机对照实验数据”的常用方法,目的在于减少观察数据偏差和混杂因素的干扰,目前在诸多领域有着广泛应用。 但基于高维数据的倾向得分匹配模型设定直接影响处理组与控制组样本匹配结果的平衡性,特别是为了达到控制潜在混淆变量、提高匹配质量和增强模型稳健性等目的而加入过多变量,造成变量之间存在相关性,给匹配带来维度灾难、支持度差异、多重检验等问题,最终导致匹配结果平衡性较差以及因果效应估计不可靠,因此使用PSM时需要合理选择模型变量来提高匹配结果的可靠性。 在变量选择方法中,最小绝对收缩和选择算子(Least Absolute Shrinkage and Selection Operator,LASSO)的独特优势是具有自动选择特征的能力,将 LASSO变量选择的优势应用到PSM中,提出得到基于LASSO的PSM模型,即LASSO-PSM模型,解决了传统倾向得分匹配中模型设定主观性和维度灾难问题。结果表明,LASSO-PSM模型的可行性及其匹配结果的平衡性优于PSM模型。 将LASSO-PSM模型应用到工作类型偏好与工作选择因素的实证研究中,利用LASSO-PSM模型对劳动者择业偏好的相应变量进行筛选,再利用筛选后的变量计算不同工作类型劳动者生活状况(经济收入、心理压力、健康状况和幸福感)的因果效应。研究结果发现:LASSO-PSM模型选择变量更符合实际意义,不同类型工作劳动者的经济收入存在显著差异。 |
英文摘要 | Randomized experiments are considered the gold standard for causal effect analysis within the counterfactual framework. However, in empirical studies based on observational data, various reasons often render the sample units unable to meet the requirements of random allocation. Propensity Score Matching (PSM) is a commonly used method to process study data into "randomized controlled trial data," aiming to reduce bias and interference from confounding factors in observational data. Currently, PSM is widely applied in numerous fields to mitigate the impact of observational data biases and confounding factors. In high-dimensional data, the specification of the propensity score matching model directly affects the balance of matching results between the treatment and control groups. Particularly, adding too many variables to control potential confounders, improve matching quality, and enhance model robustness can lead to inter-variable correlations. This, in turn, results in issues such as the curse of dimensionality, support differences, and multiple testing, ultimately leading to poor balance in matching results and unreliable estimation of causal effects. Therefore, when using PSM, it is essential to judiciously select model variables to enhance the reliability of matching results. In the variable selection process, the unique advantage of the Least Absolute Shrinkage and Selection Operator (LASSO) is its ability to automatically select features. By incorporating the advantages of LASSO variable selection into Propensity Score Matching (PSM), a LASSO-based PSM model, namely LASSO-PSM model, is proposed to address the subjectivity and dimensionality issues in traditional propensity score matching. The results indicate that the feasibility of the LASSO-PSM model and the balance of matching results are superior to those of the PSM model. In an empirical study on the preference for job types and factors influencing job selection, the LASSO-PSM model was applied to select relevant variables related to workers' job preferences. Subsequently, the selected variables were utilized to calculate the causal effects of different job types on workers' living conditions (economic income, psychological pressure, health status, and sense of happiness). The research findings indicate that the variables selected by the LASSO-PSM model are more aligned with practical significance, and there are significant differences in economic income among workers in different types of jobs. |
学位类型 | 硕士 |
答辩日期 | 2024-05-25 |
学位授予地点 | 甘肃省兰州市 |
研究方向 | 大数据分析 |
语种 | 中文 |
论文总页数 | 67 |
参考文献总数 | 64 |
馆藏号 | 0005637 |
保密级别 | 公开 |
中图分类号 | C8/413 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/36689 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 吴桐. 基于LASSO的PSM变量选择与应用[D]. 甘肃省兰州市. 兰州财经大学,2024. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
2021000003036.pdf(1072KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[吴桐]的文章 |
百度学术 |
百度学术中相似的文章 |
[吴桐]的文章 |
必应学术 |
必应学术中相似的文章 |
[吴桐]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论