Institutional Repository of School of Statistics
作者 | 慕娟 |
姓名汉语拼音 | mujuan |
学号 | 2017000003080 |
培养单位 | 兰州财经大学 |
电话 | 18294407827 |
电子邮件 | 18294407827@163.com |
入学年份 | 2017 |
学位类别 | 学术硕士 |
培养级别 | 硕士研究生 |
学科门类 | 理学 |
一级学科名称 | 统计学 |
学科方向 | 数理统计学 |
学科代码 | 0714Z3 |
授予学位 | 理学硕士 |
第一导师姓名 | 田茂再 |
第一导师姓名汉语拼音 | tianmaozai |
第一导师单位 | 中国人民大学 |
第一导师职称 | 教授 |
题名 | 高维变点模型自适应 Group Lasso 惩罚分位回归估计 |
英文题名 | Adaptive Group Lasso Penalty Quantile Regression Method for High-Dimensional Change-Point Model |
关键词 | Group lasso 变点回归 分位回归 高维数据 Oracle 性质 |
外文关键词 | Group Lasso;change-point regression;Quantile Regression;high dimensional data;Oracle properties |
摘要 | 在数据分析研究过程中,随着计算机技术的迅速发展,人们会经常遇到高维 数据,这些数据不仅表现出异方差特征明显,并且预测变量被分组,例如在生物 应用中,检测的基因或蛋白质可以按生物作用或生物基因分组医学途径。常见的 统计分析方法,如方差分析、因子分析和基于集合的函数建模,也自然表现出变 量分组。针对高维数据分析处理方法,目前相关研究文献很多且方法较为广泛。 在许多应用中,所获得的数据集不仅具有高维数据特征还显示异方差状态,这时 更适合考虑使用分段线性回归模型对每个数据段进行建模,而每个数据段由变点 分隔。但是目前针对具有变点特征的数据,大部分研究都集中在低维数据状态下, 对于高维数据变点模型关注较少。 近年来,关于变点模型和高维回归的相关文献,大多数情况下都是在具有零 均值误差和有界方差条件下构建模型。另一方面,众所周知,模型中存在异常值 可能会在最小二乘估计方法中引起较大误差。尤其是当误差分布并不是高斯分布 或其分布尾部足够大时,在研究问题时并不清楚变点前后两个时刻误差是否发生 变化,还会在检测变点时产生问题,这时更适合考虑分位数回归方法,分位回归 方法在高维数据分析中具有其独特魅力。在多变点模型中,变点估计可能会影响 估计量属性,研究变点模型的困难首先来自于两类参数的相关性:回归参数和变 点参数。但是对于高维数据分位变点回归方法的研究较少,很多时候解决方法是 先结合实际,再通过一次次实验来得到结果,这是相当麻烦的,并且当变点参数 于每段估计参数相关时或变点前后误差发生变化时,这种方法太过繁复。因此为 了方便实际问题中的应用,需要同时考虑变点模型的两类参数问题,且简便在高 维问题中的应用。 为了研究高维且存在变点的分组解释变量其性质和过程,既要确定回归变量 重要组,又要在这些组之间建立层次结构。在回归问题中,协变量可以自然分组, Group Lasso 惩罚是一种很有吸引力的变量选择方法,因为它尊重数据中的分组 结构。利用高维变点分位回归,即研究多阶段模型发生变化时的变点问题,本文 首先构建高维变点分位回归模型并使用自适应 Group Lasso 惩罚方法对模型的变 点和系数参数进行了估计;其次对于参数估计量的渐近性和其 Oracle 性质进行 研究,这涉及到相关变量组的选择问题,而不需要通过假设检验。对于在变点未知的情况下,本文利用 SQ 检验方法对变点进行检测和判断。最后通过蒙特卡罗 模拟数值结果表明,与文献中其他方法相比,该方法在高维分位变点模型中具有 较好性能。最后利用实际数据分析说明了该模型和方法的有效性和实用性。 |
英文摘要 | With the rapid development of computer technology, people often encounter high-dimensional data in the research. These data not only show obvious heteroscedasticity characteristics, but also forecast variables are grouped. For example, in biological applications, the detected genes or proteins can be grouped into medical pathways by biological action or biological genes. Common statistical analysis methods, such as analysis of variance, factor analysis, and function modeling based on base set, also naturally show variable groupings. In view of high-dimensional data analysis and processing methods, there are many related research literature and methods are more extensive. In many applications, the data set obtained not only has the characteristics of high-dimensional data but also shows the state of heteroscedasticity. At this time, it is more suitable to consider the use of multiphase linear regression model to model each data segment, and each data segment is separated by change points. However, most of the current research on data with change-point characteristics is focused on low-dimensional data, and less attention is paid to high-dimensional data change-point models. In recent years, most of the literatures about change-point model and high-dimensional regression are under the condition of zero mean error and bounded variance. On the other hand, it is well known that the existence of outliers in the model may cause large errors in the least squares estimation method. Especially when the error distribution is not Gaussian or thick-tailed, and it is not clear whether the error changes at the two moments before and after the change point, which will cause problems when detecting the change point, it is more suitable to consider quantile regression which has its unique charm in high-dimensional data analysis. In the multiphase model, the change-point estimation may affect the properties of the estimator. The difficulty in studying the change-point model first comes from the correlation of two types of parameters: regression parameters and change-point parameters. However, there are few researches on the quantile change-point regression method for high-dimensional data. In many cases, the solution is to combine the practice first and then get the results through one experiment, which is quite troublesome. Moreover, when the change-point parameter is related to each estimation parameter or the error before and after the change point changes, this method is too complex. Therefore, in order to facilitate the application of practical problems, it is necessary to consider two kinds of parameter problems of the change-point model at the same time, and to simplify the application in high-dimensional problems. In order to study the nature and process of group explanatory variables with high-dimension and change-points, we should not only determine the important groups of regression variables, but also establish a hierarchy between these groups. In regression problems, covariates can be grouped naturally, and group lasso penalty is an attractive variable selection method because it respects the grouping structure in data. Using the high-dimensional change-point quantile regression, that is, to study the change-point problem when the multiphase model changes, this paper first constructs the high-dimensional change-point quantile regression model and uses the adaptive group Lasso penalty method estimates the parameters of the model's cahnge points and coefficients. Secondly, it studies the asymptotics of the parameter estimators and their Oracle properties, which involves the selection of the related variable groups, without passing the hypothesis test. When the change point is unknown, this paper uses the test method to detect and judge the change point. Finally, Monte Carlo simulation results show that compared with other methods in the literature, this method has better performance in high-dimensional quantile model. Finally, the effectiveness and practicability of the model and method are illustrated by the analysis of the actual data. |
学位类型 | 硕士 |
答辩日期 | 2020-05-24 |
学位授予地点 | 甘肃省兰州市 |
研究方向 | 复杂数据分析 |
语种 | 中文 |
论文总页数 | 52 |
论文印刷版中手工粘贴图片页码 | 0 |
插图总数 | 0 |
插表总数 | 0 |
参考文献总数 | 0 |
馆藏号 | 0002659 |
保密级别 | 公开 |
中图分类号 | O212/8 |
保密年限 | 0 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/18958 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 慕娟. 高维变点模型自适应 Group Lasso 惩罚分位回归估计[D]. 甘肃省兰州市. 兰州财经大学,2020. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
35395.pdf(6961KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[慕娟]的文章 |
百度学术 |
百度学术中相似的文章 |
[慕娟]的文章 |
必应学术 |
必应学术中相似的文章 |
[慕娟]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论