基于分层数据的分位数回归研究

作者	杨小卜
姓名汉语拼音	Yang Xiaobo
学号	2020000003014
培养单位	兰州财经大学
电话	18102053823
电子邮件	18102053823@163.com
入学年份	2020-09
学位类别	学术硕士
培养级别	硕士研究生
学科门类	理学
一级学科名称	统计学
学科方向	数理统计学
学科代码	0714Z3
第一导师姓名	郭精军
第一导师姓名汉语拼音	GuoJingjun
第一导师单位	兰州财经大学
第一导师职称	教授
题名	基于分层数据的分位数回归研究
英文题名	Quantile Regression Research Based on Stratified Data
关键词	卷积分层数据分位数回归缺失数据变量选择
外文关键词	Convolution ; Stratified data ; Quantile regression ; Missing data ; Variable selection
摘要	在日常生活中, 人们会接触到各式各样的、来源非常多元化的数据. 为了发掘与利用数据的潜在价值, 需要根据数据的特点构建各式各样的统计模型. 随着大数据时代的到来, 数据量的增加使得诸如分层数据这样具有复杂结构的数据出现. 目前关于分层数据的研究聚焦于模型的推广：从分层最小二乘回归模型到分层分位数回归模型、分层logistic回归模型. 虽然, 上述模型拓宽了数据的应用范围. 但是仍存在一些问题如下：上述模型中的数据是完整的, 没有考虑数据出现缺失这一更符合现实生活的情况；分层分位数回归模型因为分位数模型损失函数不可微的缺点, 导致估计精度降低, 分层分位数回归模型在惩罚参数求解过程中对两个惩罚参数分别使用不同的算法求解, 致使估计的效率降低. 因此, 本文针对上述两个问题研究具有分层特性数据的建模问题, 主要研究内容分为两部分： (1) 研究响应变量随机缺失以及异方差情况下响应变量随机缺失的分层分位数回归模型的估计问题. 首先采用逆概率加权方法对随机缺失的响应变量进行处理并使用LASSO惩罚函数与满足Oracle性质的ADALASSO惩罚函数进行降维处理, 构建回归系数的估计, 证明参数估计量的渐近性质. 其次在异方差与响应变量随机缺失的假设条件下, 同样证明了参数估计量的渐近性质. 最后通过蒙特卡洛数值模拟与实际人体基因数据进行实例分析, 结果表明所提方法表现良好. (2) 基于分层分位数回归模型损失函数不可微, 导致模型估计精度降低的缺点, 提出分层卷积平滑分位数回归模型. 首先使用核函数通过卷积方法平滑处理不可微的分位数损失函数, 使之成为具有良好性质的可微凸函数. 其次使用LASSO惩罚函数进行降维处理并在参数估计时, 使用多步ADMM算法进行求解. 同时模型的惩罚参数通过公式变换, 使用CV(交叉验证)方法进行求解. 最后通过数值模拟与实例分析表明：所提方法在处理具有尖峰厚尾特性数据时相较于分层分位数回归模型、分层线性回归模型更为有效；估计效率明显优于分层分位数回归模型.
英文摘要	In daily life, people come into contact with a wide variety of data sources. In order to explore and utilize the potential value of data, it is necessary to build various statistical models according to the characteristics of data. With the advent of the era of big data, the increase in the amount of data makes the appearance of data with complex structure such as stratified data. At present, the research on stratified data focuses on the promotion of models: from stratified least square regression model to stratified quantile regression model, stratified logistic regression model. However, the above model broadens the range of applications of the data. However, there are still some problems as follows: the data in the above model is complete and does not consider the missing data, which is more in line with real life; The loss function of stratified quantile regression model is not differentiable, so the accuracy of estimation is reduced. In the process of solving penalty parameters, the stratified quantile regression model uses different algorithms to solve the two penalty parameters, which reduces the efficiency of estimation. Therefore, this paper studies the modeling of data with stratified characteristics. The main research content is divided into two parts: (1) Based on stratified data, the estimation problem of stratified quantile regression model with random loss of response variables and random loss of response variables under heteroscedasticity is studied. Firstly, the inverse probability weighting method was used to process the randomly missing response variables and the LASSO penalty function was used to reduce the dimension. The regression coefficient was estimated to prove the asymptotic property of the parameter estimators. Secondly, the asymptotic properties of parameter estimators are also proved under the assumption of heteroscedasticity and random absence of response variables. Finally, Monte Carlo numerical simulation and actual human genetic data analysis show that the proposed method performs well. (2) Based on the fact that the loss function of stratified quantile regression model is nondifferentiable, which leads to reduced accuracy of model estimation, a stratified convolutional smooth quantile regression model is proposed First, the kernel function is used to smooth the nondifferentiable quantile loss function by convolution method, so that it becomes a differentiable convex function with good properties Secondly, the LASSO penalty function is used for dimension reduction and the multi-step ADMM algorithm is used for parameter estimation. At the same time, the penalty parameters of the model are solved by CV (cross validation) method through formula transformation. Finally, numerical simulation and case analysis show that the proposed method is more effective than stratified quantile regression model and stratified linear regression model in processing data with peak thick tail characteristics; The estimation efficiency is obviously superior to the stratified quantile regression model.
学位类型	硕士
答辩日期	2023-05-20
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	53
参考文献总数	51
馆藏号	0004822
保密级别	公开
中图分类号	O212/32
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/34142
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	杨小卜. 基于分层数据的分位数回归研究[D]. 甘肃省兰州市. 兰州财经大学,2023.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
2020000003014_LW.pdf（1696KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载