基于密度幂散度族的稳健小域估计

作者	王朝旭
姓名汉语拼音	Wang Zhaoxu
学号	2019071400002
培养单位	兰州财经大学
电话	18797332607
电子邮件	wangzx@qhnu.edu.cn
入学年份	2019-9
学位类别	博士学位
培养级别	博士研究生
一级学科名称	统计学
学科方向	数理统计学
学科代码	0714
第一导师姓名	庞智强
第一导师姓名汉语拼音	Pang Zhiqiang
第一导师单位	兰州财经大学
第一导师职称	教授
题名	基于密度幂散度族的稳健小域估计
英文题名	Robust small area estimation with density power divergences
关键词	小域估计 FH模型 NER模型稳健估计密度幂散度 γ散度
外文关键词	Small area estimation ; FH model ; NER model ; Robust estimation ; Density power divergence ; γ divergence
摘要	在统计推断中，如何利用样本数据对总体目标变量进行有效的估计是非常普遍的研究问题，这类问题在实践应用中也很常见，其估计方法具有广泛的应用需求。基于抽样设计的直接估计方法是解决这类问题最直接的思路。然而，当估计总体区域由众多的小域构成时，估计小域上的目标变量时会面临小样本乃至无样本的情形。在这种情形下，利用样本的直接估计方法可能会产生较大的误差或无法得到有效估计。而小域估计方法则是解决这类问题的有效方法之一。相较于传统抽样估计方法，小域估计方法借助于辅助变量的信息能够得到小区域上目标变量的有效估计，能够解决小样本和无样本情形下的估计问题。近年来，小域估计方法在人口统计、生物统计、农业统计和政府统计等领域应用非常广泛，相关的学术研究也较为丰富，使小域估计的理论得到了系统发展。基于模型的小域估计方法作为小域估计的主要方法，是小域估计的核心内容。基于模型的估计方法能够很好地将辅助变量应用于估计模型，从而达到“借力”的作用，以此来解决小样本和无样本问题。在小域估计的模型中，通常假设区域随机效应和模型随机误差均服从正态分布。但实践证明，当存在异常观测值时，基于正态分布假设的小域模型假设失效，这会导致参数估计和目标变量估计产生较大的偏差。因此，需要探究对异常观测值不敏感的稳健估计方法。在小域稳健估计中，目前应用较为广泛的方法有两类。一类方法为假设模型误差为有偏分布的估计方法，例如假设模型误差服从 t 分布或柯西分布，通过有偏分布建模来减小异常观测值对估计量的影响；另一类方法为利用 Huberϕ 函数对经验线性无偏估计量进行稳健化处理，通过 Huberϕ 函数的优良性质来达到稳健性的目的。虽然这两类方法在大多数情况下能有效降低异常观测值对估计量的影响，但当异常值过大时，其估计效果将受到限制，估计结果仍然会存在较大的偏差。因此，在实际应用中需要针对具体情况选择恰当的小域稳健估计方法进行估计。研究稳健小域估计方法，是当前小域估计中非常现实的问题。由于非正态性观测数据的普遍存在以及异常值的出现，对小域估计方法提出了新的挑战。为了解决该情形下估计量不稳定以及预测值的大偏差问题，稳健小域估计方法被众多学者所关注到。本研究考虑到密度幂散度族在稳健估计中的重要特性，将其应用到小域估计中，提出基于密度幂散度族的稳健估计方法，以弥补现有稳健小域估计的不足。通过将密度幂散度族应用于小域估计中，探究非正态以及具有异常观测情形下小域模型系数和目标变量的估计问题。在本研究中，旨在构建小域模型系数、目标变量的稳健估计量，并给出参数的置信区间以及估计量的均方误差。首先，为了解决区域水平模型中的稳健估计问题，探究了密度幂散度和 γ 散度在 FH 模型中的应用。通过将密度幂散度应用于 FH 模型，得到了模型系数的稳健估计和渐进分布。在此基础上，讨论了目标变量的稳健估计量，并给出了估计量的均方误差。为了得到小域估计量的可靠估计，本研究同时给出了其目标变量的置信区间。通过对模拟数据和实际数据建立小域模型，结合本研究提出的稳健估计方法，进行稳健估计，并和现有的稳健估计方法进行了比较。通过比较发现，提出的估计方法能够通过调整参数控制估计的有效性和稳健性之间的平衡。当观测数据中不存在异常值时，本研究提出的估计方法通过使用较小的调整参数和现有的最优线性无偏估计方法得到的稳健估计效果相差不大。在观测数据中存在异常值时，提出的稳健估计方法比起现有的估计方法具有更小的均方误差，说明本研究提出的估计方法是有效的。其次，研究了基于密度幂散度、γ 散度的单元水平模型的稳健估计问题。将这两类散度应用于 NER 模型，对模型的系数进行了稳健估计，得到了模型系数的稳健估计及其渐进分布。在单元水平模型下，讨论了区域上关于目标变量函数形式的稳健估计量以及有限总体的区域均值的稳健估计。由于单元水平模型中关于目标变量的估计中会涉及多重积分的计算，本研究利用 MCMC 的方法给出了关于目标变量的函数形式的估计值，并结合 Bootstrap 方法给出了估计量的MSE。同样地，将本研究提出的估计方法和现有的稳健估计方法进行了比较。通过对模拟数据和实际数据的应用发现，本研究提出的估计方法能够提高更加稳健的估计结果。无论是模型系数的估计，还是目标变量的小域估计量，本研究得到的结果均具有更小的偏差和均方误差。为了动态展示本研究提出的估计方法在混合正态分布中的表现，本研究比较了几类估计结果随着污染分布的方差以及污染分布的比例变化时估计的 MSE 的动态变化图，通过比较图形发现，无论是模型系数还是区域均值，其 MSE 的变化均不太显著，而现有的稳健估计方法却表现一般，受污染比例和污染方法方差的波动较大。最后，在本研究中提出了用密度幂散度进行稳健估计的参数选择算法。对两类小域估计模型进行稳健估计时，引入的估计方法中存在一个调整参数，该参数能够根据观测数据的特征调节模型估计的有效性和稳健性。一般地，当模型中异常值较小时，可以选取较小的调节参数，而存在较多的异常值时，通过较大的调节参数来达到稳健估计的目的。本研究在调整参数的选取中，引入了一类迭代估计算法，该算法能够根据数据的特点自动选取使其估计 MSE 最小的调整参数，文中给出了调整参数的选择算法。本研究结合密度幂散度族提出了针对基本小域模型的稳健估计方法。在本研究中，给出了模型参数、目标变量的估计表达式以及区间估计等。通过模拟和实际数据的验证，发现本研究中提出估计方法优于现有的稳健小域估计方法，并对非正态数据和异常观测值均具有较为理想的估计结果，能够解决不满足基本假设的小域估计问题。在实际应用中，本研究中提出的方法也具有较高的可操作性与估计效果，并通过中国家庭调查数据进行了佐证。本研究中提出的方法能够适用于更加广泛的小域估计模型，能够为决策者提供更加可靠的小域估计量。
英文摘要	The effective estimation of population target variables using sample data is a common research problem in statistical inference, and the estimation methods have a wide range of practical applications. The direct estimation method based on sampling design is the most direct approach to solving such problems. However, when the estimated population area is composed of numerous small domains,estimating the target variables in small domains may face situations of small or even no samples. In such cases, the direct estimation method using samples may result in large errors or ineffective estimates. The small area estimation is one of the effective methods to solve such problems. Compared with traditional sampling estimation methods, the small area estimation can obtain effective estimates of target variables in small areas by utilizing auxiliary variable information, which can solve estimation problems in situations of small or no samples. In recent years, the small area estimation method has been widely used in fields such as population statistics, biostatistics, agricultural statistics, and government statistics, and related academic research is also relatively abundant, which has led to the systematic development of small area estimation theory. As the main method of small area estimation, model-based small area estimation is the core content of small area estimation. Because the model-based estimation method can well apply auxiliary variables to the estimation model, so as to achieve the role of "leverage", so as to solve the problem of small sample and no sample. In small area estimation models, regional random effects and model random errors are usually assumed, and both of them are assumed to follow normal distribution. A large number of practical studies have shown that the assumption of normal distribution is not valid when there are abnormal observations in the model. This directly leads to the failure of the assumption of the basic small area model, which will make the parameter estimation based on the normal assumption and the estimation of the target variable have a large deviation, so it is necessary to further explore the robust estimation method which is insensitive to the abnormal observations. There are two kinds of widely used methods in small area robust estimation. One kind of methods is the estimation method assuming that the model error is biased distribution, for example, assuming that the model error follows the t−distribution or Cauchy distribution. The biased distribution is constructed to reduce the influence of abnormal observations on the estimator. Another kind of methods is to use the Huber-ϕ function to robust the empirical linear unbiased estimator, through the properties of Huber-ϕ function to achieve the purpose of robustness. However, when the outliers are too large, the estimation effects of the above two estimation methods are limited, and in some cases, the estimation results will still produce large deviations. Studying robust small area estimation methods is a current practical issue in small area estimation. The widespread existence of non-normal observable data and the occurrence of outliers have presented new challenges to small area estimation methods. In order to solve the problem of unstable estimates and large prediction biases in such situations, the robust small area estimation method has garnered attention from many scholars. In this thesis, we consider the important characteristics of density power divergence families in robust estimation and apply them to small area estimation, proposing a robust estimation method based on density power divergence families to address the shortcomings of existing robust small area estimation methods. By applying density power divergence families to small area estimation, we investigate the estimation problems of small area model coefficients and target variables under non-normal and outlier observation scenarios. The aim of this thesis is to construct robust estimates of small area model coefficients and target variables, and to provide confidence intervals for the parameters and the mean squared error of the estimates. Firstly, the robust estimation of area level models based on density power divergence and γ divergence is studied in this thesis. By applying the density-power divergence to the FH model, the robust estimates and asymptotic distributions of the model coefficients are obtained. On this basis, the robust estimator of the target variable is discussed, and the mean square error of the estimator is given. In order to obtain the reliable estimation of the small area estimator, the confidence intervals of the target variables are also given. Through the establishment of small area model of simulated data and actual data, combined with the robust estimation method proposed in this thesis, the robust estimation is carried out, and compared with the existing robust estimation methods. Through comparison, it is found that the proposed estimation method can control the balance between validity and robustness by tuning parameters. When there are no outliers in the observed data, the robust estimation results obtained by the proposed estimation method are similar to those obtained by the existing optimal linear unbiased estimation methods by using small tuning parameters. When there are outliers in the observed data, the proposed robust estimation method has a smaller mean square error than the existing estimation methods, which indicates that the proposed estimation method is effective. Secondly, the robust estimation of unit level models based on density power divergence and γ divergence is studied. These two kinds of divergences are applied to the NER model, and the coefficients of the model are estimated robustly, and their asymptotic distributions are obtained. Under the unit level model, The robust estimator of the functional form of the target variable and the robust estimator of the area mean of the finite population are studied. Since the estimation of the target variable in the unit level model involves the calculation of multiple integrals, the MCMC method is used to give the estimate of the functional form of the target variable, and combine the bootstrap method to give the MSE of the estimator. Similarly, the proposed estimation method is compared with the existing robust estimation methods. Through the application of simulated data and real data, it is found that the estimation method proposed in this thesis can improve the more robust estimation results. Both the estimation of model coefficients and the small area estimator of the target variable, the results obtained in this thesis have smaller bias and mean square error. In order to dynamically display the performance of the estimation method of the mixed normal distribution proposed in this thesis, this thesis compared the dynamic changes of MSE estimated by several types of estimation results with the variance of pollution distribution and the proportion of pollution distribution. Through comparing the graphs, it was found that no matter the model coefficient or the regional mean value, the change of MSE was not significant. However, the existing robust estimation methods perform poorly, and the proportion of contamination and the variance of pollution methods fluctuate greatly. In this thesis, a parameter selection algorithm using density power divergence for robust estimation is proposed. When robustly estimating two types of small area estimation models, an tuning parameter is introduced into the estimation method, which can adjust the effectiveness and robustness of the model estimation according to the characteristics of the observed data. Generally, when there are fewer outliers in the model, a smaller tuning parameter can be selected, while when there are more outliers, a larger tuning parameter can be used to achieve the purpose of robust estimation. In the selection of tuning parameters, this thesis introduces an iterative estimation algorithm, which can automatically select the tuning parameter that minimizes the estimated mean square error (MSE) according to the characteristics of the data. The algorithm for selecting the parameter is presented in this thesis. This thesis proposes a robust estimation method for the basic small area model based on the density power divergence family. The estimation expressions and interval estimations of the model parameters and target variables are provided in this thesis. Through simulations and validation with real data, it is found that the proposed estimation method in this thesis performs better than existing robust small area estimation methods, and has relatively ideal estimation results for non-normal data and outlier observations, which can solve small area estimation problems that do not meet the basic assumptions. In practical applications, the method proposed in this thesis has high operability and estimation effectiveness, and has been demonstrated by the China Family Panel Studies data. This method proposed in this thesis can be applied to a wider range of small area estimation models, and can provide more reliable small area estimates for decision-makers.
学位类型	博士
答辩日期	2023-05-20
学位授予地点	甘肃省兰州市
语种	中文
论文总页数	220
参考文献总数	158
馆藏号	D00002
保密级别	公开
中图分类号	C8/2
文献类型	学位论文
条目标识符	http://ir.lzufe.edu.cn/handle/39EH0E1M/34239
专题	统计与数据科学学院
推荐引用方式 GB/T 7714	王朝旭. 基于密度幂散度族的稳健小域估计[D]. 甘肃省兰州市. 兰州财经大学,2023.

条目包含的文件		下载所有文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可
10741-2019071400002-（2683KB）	学位论文		开放获取	CC BY-NC-SA	浏览下载