英文摘要 | The effective estimation of population target variables using sample data is a common research problem in statistical inference, and the estimation methods have a wide range of practical applications. The direct estimation method based on sampling design is the most direct approach to solving such problems. However, when the estimated population area is composed of numerous small domains,estimating the target variables in small domains may face situations of small or even no samples. In such cases, the direct estimation method using samples may result in large errors or ineffective estimates. The small area estimation is one of the effective methods to solve such problems. Compared with traditional sampling estimation methods, the small area estimation can obtain effective estimates of target variables in small areas by utilizing auxiliary variable information, which can solve estimation problems in situations of small or no samples. In recent years, the small area estimation method has been widely used in fields such as population statistics, biostatistics, agricultural statistics, and government statistics, and related academic research is also relatively abundant, which has led to the systematic development of small area estimation theory.
As the main method of small area estimation, model-based small area estimation is the core content of small area estimation. Because the model-based estimation method can well apply auxiliary variables to the estimation model, so as to achieve the role of "leverage", so as to solve the problem of small sample and no sample. In small area estimation models, regional random effects and model random errors are usually assumed, and both of them are assumed to follow normal distribution. A large number of practical studies have shown that the assumption of normal distribution is not valid when there are abnormal observations in the model. This directly leads to the failure of the assumption of the basic small area model, which will make the parameter estimation based on the normal assumption and the estimation of the target variable have a large deviation, so it is necessary to further explore the robust estimation method which is insensitive to the abnormal observations. There are two kinds of widely used methods in small area robust estimation. One kind of methods is the estimation method assuming that the model error is biased distribution, for example, assuming that the model error follows the t−distribution or Cauchy distribution. The biased distribution is constructed to reduce the influence of abnormal observations on the estimator. Another kind of methods is to use the Huber-ϕ function to robust the empirical linear unbiased estimator, through the properties of Huber-ϕ function to achieve the purpose of robustness. However, when the outliers are too large, the estimation effects of the above two estimation methods are limited, and in some cases, the estimation results will still produce large deviations.
Studying robust small area estimation methods is a current practical issue in small area estimation. The widespread existence of non-normal observable data and the occurrence of outliers have presented new challenges to small area estimation methods. In order to solve the problem of unstable estimates and large prediction biases in such situations, the robust small area estimation method has garnered attention from many scholars. In this thesis, we consider the important characteristics of density power divergence families in robust estimation and apply them to small area estimation, proposing a robust estimation method based on density power divergence families to address the shortcomings of existing robust small area estimation methods. By applying density power divergence families to small area estimation, we investigate the estimation problems of small area model coefficients and target variables under non-normal and outlier observation scenarios. The aim of this thesis is to construct robust estimates of small area model coefficients and target variables, and to provide confidence intervals for the parameters and the mean squared error of the estimates.
Firstly, the robust estimation of area level models based on density power divergence and γ divergence is studied in this thesis. By applying the density-power divergence to the FH model, the robust estimates and asymptotic distributions of the model coefficients are obtained. On this basis, the robust estimator of the target variable is discussed, and the mean square error of the estimator is given. In order to obtain the reliable estimation of the small area estimator, the confidence intervals of the target variables are also given. Through the establishment of small area model of simulated data and actual data, combined with the robust estimation method proposed in this thesis, the robust estimation is carried out, and compared with the existing robust estimation methods. Through comparison, it is found that the proposed estimation method can control the balance between validity and robustness by tuning parameters. When there are no outliers in the observed data, the robust estimation results obtained by the proposed estimation method are similar to those obtained by the existing optimal linear unbiased estimation methods by using small tuning parameters. When there are outliers in the observed data, the proposed robust estimation method has a smaller mean square error than the existing estimation methods, which indicates that the proposed estimation method is effective.
Secondly, the robust estimation of unit level models based on density power divergence and γ divergence is studied. These two kinds of divergences are applied to the NER model, and the coefficients of the model are estimated robustly, and their asymptotic distributions are obtained. Under the unit level model, The robust estimator of the functional form of the target variable and the robust estimator of the area mean of the finite population are studied. Since the estimation of the target variable in the unit level model involves the calculation of multiple integrals, the MCMC method is used to give the estimate of the functional form of the target variable, and combine the bootstrap method to give the MSE of the estimator. Similarly, the proposed estimation method is compared with the existing robust estimation methods. Through the application of simulated data and real data, it is found that the estimation method proposed in this thesis can improve the more robust estimation results. Both the estimation of model coefficients and the small area estimator of the target variable, the results obtained in this thesis have smaller bias and mean square error. In order to dynamically display the performance of the estimation method of the mixed normal distribution proposed in this thesis, this thesis compared the dynamic changes of MSE estimated by several types of estimation results with the variance of pollution distribution and the proportion of pollution distribution. Through comparing the graphs, it was found that no matter the model coefficient or the regional mean value, the change of MSE was not significant. However, the existing robust estimation methods perform poorly, and the proportion of contamination and the variance of pollution methods fluctuate greatly.
In this thesis, a parameter selection algorithm using density power divergence for robust estimation is proposed. When robustly estimating two types of small area estimation models, an tuning parameter is introduced into the estimation method, which can adjust the effectiveness and robustness of the model estimation according to the characteristics of the observed data. Generally, when there are fewer outliers in the model, a smaller tuning parameter can be selected, while when there are more outliers, a larger tuning parameter can be used to achieve the purpose of robust estimation. In the selection of tuning parameters, this thesis introduces an iterative estimation algorithm, which can automatically select the tuning parameter that minimizes the estimated mean square error (MSE) according to the characteristics of the data. The algorithm for selecting the parameter is presented in this thesis.
This thesis proposes a robust estimation method for the basic small area model based on the density power divergence family. The estimation expressions and interval estimations of the model parameters and target variables are provided in this thesis. Through simulations and validation with real data, it is found that the proposed estimation method in this thesis performs better than existing robust small area estimation methods, and has relatively ideal estimation results for non-normal data and outlier observations, which can solve small area estimation problems that do not meet the basic assumptions. In practical applications, the method proposed in this thesis has high operability and estimation effectiveness, and has been demonstrated by the China Family Panel Studies data. This method proposed in this thesis can be applied to a wider range of small area estimation models, and can provide more reliable small area estimates for decision-makers.
|
修改评论