Institutional Repository of School of Statistics
作者 | 徐嘉璐 |
姓名汉语拼音 | Xu Jialu |
学号 | 2020000003059 |
培养单位 | 兰州财经大学 |
电话 | 13099137076 |
电子邮件 | 2399192600@qq.com |
入学年份 | 2020-09 |
学位类别 | 专业硕士 |
培养级别 | 硕士研究生 |
一级学科名称 | 应用统计 |
学科代码 | 0252 |
第一导师姓名 | 郭精军 |
第一导师姓名汉语拼音 | Guo Jingjun |
第一导师单位 | 兰州财经大学 |
第一导师职称 | 教授 |
题名 | 几种变量选择方法的对比研究及应用 |
英文题名 | High-dimensional data; Variable selection; Prediction accuracy; Stability |
关键词 | 高维数据 变量选择 预测精度 稳定性 |
外文关键词 | High-dimensional data ; Variable selection ; Prediction accuracy ; Stability |
摘要 | 身处科学技术高速发展的今天,数据组成了生活,每一部分都包含有巨大的信息量,由于这些数据并不全是有利用价值的,随之而来的问题就是如何整理和分析这些数据,变量选择方法是其中关键的一步,这关系到模型的解释预测能力以及抗干扰能力。 首先介绍了本文所涉及到的各种惩罚函数实现变量选择的原理、特点、相关关系等基础理论知识进行梳理和总结。随后重点进行了几种变量选择在不同类型数据下的稳定性、准确性、计算复杂程度等方面性能比较。其一,在不同的相关程度下:发现在常规数据且相关程度在中等以下时,pcr表现较好,而当相关系数大于0.8时,spcr和spcLasso方法则显示出了其在处理高相关性数据的优势,各项指标均能够保持在一个良好水平;其二,在不同数据维度下:发现在常规数据下,含有稀疏主成分方法稳定性较差,但随着维数增加,文中提出的几种变量选择方法的稳定性和准确定都有一定的降低,但spcLasso方法的稳定性和准确性指标相比较其他几种方法下降较小;其三,不同数据环境下的运行时间:稀疏主成分类的方法由于其迭代的复杂程度较高,花费的时间较长。进行以上的分析是为了挖掘各自方法的特点,并不是为了说明某种方法具有绝对的优势,而是为了说明不同模型适用于不同类型的数据。目前这方面的研究还比较少,同时为实际应用中进行变量选择提供参考。 最后以“我国铁路客运量研究”为案例,选择2010-2022年的相关指标数据,将文章中所提及的几种方法应用于上,实例分析结果符合现实意义,说明前文中所得到的变量选择模型能够为处理现实问题能够提供新思路。 |
英文摘要 | In today's rapid development of science and technology, data constitutes life, and each part contains a huge amount of information, but these data are not all useful, the ensuing problem is how to organize and analyze these data, variable selection method is one of the key steps, which is related to the model's interpretation and prediction ability and anti-interference ability. Firstly, the basic theoretical knowledge of various penalty functions and variable selection involved in this paper is introduced and summarized. Subsequently, the performance comparison of stability, accuracy, and computational complexity of several variable selections under different types of data was carried out. First, under different degrees of correlation: it was found that PCR performed better when the correlation level was below medium in the regular data, When the correlation coefficient is greater than 0.8, the spcr and spcLasso methods show their advantages in processing high correlation data; second, under different data dimensions: under conventional data, the stability of methods containing sparse principal components is poor, but with the increase of dimensionality, the stability and quasi-determination of several methods are reduced, but the stability and accuracy of spcLasso methods are still maintained at a good level; third, the running time in different data environments: the sparse principal classification method takes a long time due to its high iterative complexity. These results indicate that a more applicable variable selection method is to be selected for different types of data sets. The above analysis was conducted to uncover the characteristics of the respective methods, not to show that one method has an absolute advantage, but to illustrate that different models are applicable to different types of data. No studies in this area have been found, and to provide a reference for making variable selections for practical applications. Finally, taking "China's railway passenger traffic research" as a case, the relevant index data from 2010 to 2022 are selected, and the several methods mentioned in the article are applied to the above, and the case analysis results are of practical significance, indicating that the variable selection model obtained above can provide new ideas for dealing with real problems. |
学位类型 | 硕士 |
答辩日期 | 2023-05-20 |
学位授予地点 | 甘肃省兰州市 |
语种 | 中文 |
论文总页数 | 72 |
参考文献总数 | 48 |
馆藏号 | 0005027 |
保密级别 | 公开 |
中图分类号 | C8/353 |
文献类型 | 学位论文 |
条目标识符 | http://ir.lzufe.edu.cn/handle/39EH0E1M/34145 |
专题 | 统计与数据科学学院 |
推荐引用方式 GB/T 7714 | 徐嘉璐. 几种变量选择方法的对比研究及应用[D]. 甘肃省兰州市. 兰州财经大学,2023. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
10741_2020000003059_(6530KB) | 学位论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[徐嘉璐]的文章 |
百度学术 |
百度学术中相似的文章 |
[徐嘉璐]的文章 |
必应学术 |
必应学术中相似的文章 |
[徐嘉璐]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论