作者牛玺娟
姓名汉语拼音niuxijuan
学号2020071400007
培养单位兰州财经大学
电话18797321067
电子邮件2014107@qhnu.edu.cn
入学年份2020-9
学位类别博士学位
培养级别博士研究生
学科门类经济学
一级学科名称统计学
学科方向统计学
学科代码0714
第一导师姓名庞智强
第一导师姓名汉语拼音pangzhiqiang
第一导师单位兰州财经大学
第一导师职称教授
题名误差叠加条件下有限总体方差的校准估计研究
英文题名Calibration Estimation of the Finite Population Variance under Cumulative Errors
关键词总体方差 辅助信息 非抽样误差 误差叠加 校准估计
外文关键词Population variance; Auxiliary information; Non-sampling errors; Errors accumulation; Calibration estimation
摘要

总体方差作为衡量数据分散程度的关键指标,其准确估计对于理解总体特 征、制定政策和进行科学决策至关重要。总体方差估计提供了关于估计量的准确 性的信息,并允许得出关于总体参数真实值的有效结论,在统计推断的理论研究 和工业、农业、生物科学等诸多领域的应用研究中均有普遍需求。为了获得总体 方差的估计,最初的方法是利用抽样调查进行直接估计,将样本数据的样本方差 作为总体方差的估计值。然而,误差的存在影响着总体方差的估计,抽样调查的 过程中总会面临两类误差:抽样误差和非抽样误差。尽管抽样误差可以通过增加 样本量等方法来控制,但非抽样误差,特别是无回答误差和计量误差这两种非抽 样误差,往往难以避免,对总体方差的估计产生显著影响。因此,开发出能够最 大限度应对各种误差影响的有限总体方差的估计方法,并制定适当的监管措施 是极其重要的。

在总体方差的估计中,现有研究存在如下几个方面的不足:首先,大多研究 者仅仅考虑了抽样误差对总体方差估计的影响,并没有充分地考虑非抽样误差 这一影响因素。其次,由于非抽样误差的成因较为复杂,种类繁多,现有关于总 体方差估计的研究中只考虑了单一类型的非抽样误差,没有涉及多种非抽样误 差叠加情形下总体方差的估计。再次,大多研究只考虑了简单随机抽样设计下的 总体方差估计,在其它抽样设计下的总体方差估计同样没有系统分析误差叠加 的情形。最后,由于非抽样误差的存在,抽样数据中容易出现异常值,从而影响 估计的精度和稳定性,这一点很少有学者考虑到。因此,现有研究还需解决如下 几个问题:第一,如何在总体方差的估计中考虑抽样误差和非抽样误差的双重影 响;第二,如何将多种非抽样误差叠加存在时的影响综合纳入总体方差估计中, 从而获取更加精确的估计结果;第三,在不同的抽样设计背景下,如何系统分析 误差叠加存在时总体方差估计的具体影响;第四,针对抽样数据存在异常值和不 稳定情况,如何通过改进方式来提高总体方差估计精度的问题。

本研究关注了在误差叠加的情况下,如何对有限总体的方差进行校准估计。 主要的研究工作包括如下几个方面:

第一,在有限总体方差的估计过程中,综合考虑了抽样误差和非抽样误差。 从误差叠加的视角出发,深入分析了抽样调查中普遍存在的各种误差,包括抽样 误差和非抽样误差,特别是非抽样误差中的无回答误差和计量误差,识别和量化这些误差对有限总体方差估计过程中的具体影响。在这一背景下,提出了有限总 体方差的不同类型估计量,包括简单估计、比率估计、回归估计、指数估计以及 一般估计。

第二,建立了不同抽样设计下的有限总体方差估计方法。由于不同抽样设计 中的数据结构和研究目标的不同,需要分别考虑不同抽样设计中的总体方差估 计。简单随机抽样作为一种基础的抽样方法,能够为总体的研究提供最直接且有 价值的数据,这种方法虽然在理论上较为成熟。然而,在实际中,为了抽样的准 确性和可操作性,除了简单随机抽样,还有必要考虑其它的抽样设计,比如:分 层抽样、多阶段抽样、整群抽样、二重抽样等。特别是在这样的抽样设计中,由 于数据来源的多样化,产生的非抽样误差也更加复杂。因此,传统的总体方差估 计方法不再适用,需要开发更为复杂和精细的估计技术。在本研究中,针对简单 随机抽样、分层抽样和二重抽样三种不同的抽样设计,分别提出了应对无回答误 差和计量误差影响的有限总体方差估计方法,并进一步比较了各类估计方法的 估计效率。

第三,提出了多种非抽样误差叠加存在时的有限总体方差估计方法。在实际 调查研究中,无回答误差与计量误差往往是并存的,这些误差的出现不仅降低了 数据的代表性,还可能导致估计结果的偏差和方差增加,从而显著影响了总体方 差的估计效果。本研究综合分析了无回答误差与计量误差等多种非抽样误差叠 加存在时对总体方差估计的影响,并提出了无回答与计量误差叠加存在时的有 限总体方差估计方法。针对简单随机抽样、分层抽样和二重抽样三种不同的抽样 设计,构建了相应的有限总体方差估计量。进一步,在一阶近似条件下推导了这 些估计量的偏差和均方误差的表达式,并评价了提出的方法相对于现有估计方 法在统计性质上的优势和有效性。

第四,融合了校准估计方法,用于解决抽样数据存在异常值的问题,提高总 体方差估计的精度。本研究将校准估计方法应用于有限总体方差估计,旨在实现 更准确的估计效果。这种方法的核心在于利用辅助信息来调整样本中的权重,目 的在于降低误差对总体方差估计的负面影响。针对简单随机抽样、分层抽样和二 重抽样三种抽样设计,分别对提出的方差估计量进行校准处理,得到了误差叠加 存在时有限总体方差的校准估计量。对于简单随机抽样,根据样本数据和辅助变 量之间的关联,调整了样本权重。在分层随机抽样中,进一步细化了这一过程, 为每层的样本单元赋予了校准后的权重,以确保层内和层间的方差估计更为精 确。对于总体辅助信息未知的情形,本研究系统分析了二重抽样下误差叠加存在时有限总体方差的校准估计,还计算了这些估计量的偏差和均方误差,评估了校 准估计量在准确性、效率和稳健性方面的表现。最后,通过模拟实验和实际数据 应用,验证了所提校准估计方法的有效性。

研究结果表明,本研究提出的方差估计量在处理抽样误差和非抽样误差,特 别是对无回答误差和计量误差存在的情况,能够提供更为精确和可靠的估计结 果。这一发现对于提高有限总体方差估计的准确性和可靠性提供了有力的技术 支持和理论依据,同时,对于提高抽样调查数据的分析质量和决策支持具有重要 的理论和实践意义。

英文摘要

The population variance, as a key indicator of the dispersion of data, is cru cial for understanding the characteristics of the population, formulating policies, and making scientific decisions. Variance estimation provides information for the accuracy of the estimates, allowing for valid conclusions about the true values of population parameters. There is a widespread demand for variance estimation in theoretical studies of statistical inference as well as in applied research in fields such as industry, agriculture, and bioscience. To estimate the population vari ance, the initial method was to conduct sampling surveys to directly estimate the population variance using the sample variance of the sample data. However, the existence of errors affects the estimation of the population variance, and the ac curacy of sampling survey results is often affected by both sampling errors and non-sampling errors. Although sampling errors can be controlled by increasing the sample size, non-sampling errors, especially non-response errors and measure ment errors, which are often unavoidable and have a significant impact on the estimation of the population variance. Therefore, it is extremely important to develop methods for estimating the finite population variance under the influence of various errors and establish appropriate regulatory measures.

In the estimation of population variance, there are several shortcomings in current research: Firstly, most researchers only consider the impact of sampling errors on the estimation of population variance, without fully addressing non sampling errors. Secondly, due to the complex causes and various types of non sampling errors, the existing research on the estimation of population variance only considers a single type of non-sampling errors and does not address the estimation of population variance under the combined influence of multiple non-sampling er rors. Thirdly, most research only considers the estimation of population variance under simple random sampling design, without systematically analyzing the com bined impact of errors in complex sampling designs. Lastly, due to the existence of non-sampling errors, outliers are prone to appear in sampling data, affecting the accuracy and stability of the estimation, which few scholars have considered this issue. Therefore, the current research still needs to address the following issues: First, how to consider the dual impact of sampling errors and non-sampling errors in the estimation of population variance; Second, how to comprehensively incor porate the combined effects of multiple non-sampling errors in the estimation of population variance to achieve more precise results; Third, how to systematically analyze the specific impacts of the population under different sampling design backgrounds variance estimation when the errors are overlaped; Fourth, in the presence of outliers and unstable situations in sampling data, how to improve the precision of the population variance estimation through improved methods.

In this study, the main focus is on the calibration estimation of the finite population variance under cumulative errors. The main research work includes the following aspects:

First, the study comprehensively considers both sampling errors and non sampling errors in estimation of the finite population variance. From the perspec tive of cumulative errors, it deeply analyzes various errors commonly present in sampling surveys, including sampling errors and non-sampling errors, especially the two main types of non-sampling errors: non-response errors and measurement errors. The study identifies and quantifies the specific impacts of these errors on the estimation process of the finite population variance. In this context, the study proposes a series of different types of estimators for the finite population variance, including classical estimators, ratio estimators, regression estimators, exponential estimators, and a general class of estimators.

Second, some estimation methods for the finite population variance under various sampling designs are established. Due to the differences in data structures and research objectives in various sampling designs, it is necessary to consider the estimation of the population variance for each design separately. Simple random sampling, as a fundamental sampling method, provides the most direct and valu able data for the study of the population. Although this method is theoretically mature, in practice, for accuracy and operability of sampling, it is also necessary to consider other complex sampling designs besides simple random sampling, such as stratified sampling, multi-stage sampling, cluster sampling, and double sampling. Especially in these sampling designs, due to the diversification of data sources,the resulting non-sampling errors are also more complex. Therefore, traditional methods for estimating population variance are no longer applicable, and more complex and refined estimation techniques need to be developed. In this study, for simple random sampling, stratified sampling, and double sampling, methods for estimating the finite population variance that account for the impact of non response and measurement errors are proposed respectively, and the estimation efficiency of various methods is further compared.

Third, the study proposes several methods for estimating the finite popula tion variance when multiple non-sampling errors are present. In actual survey research, non-response errors and measurement errors are often coexist, these er rors not only reduce the representativeness of the data but can also lead to the bias results and increased variance, thereby significantly affecting the estimation of the population variance. This study comprehensively analyzes the impact of multiple non-sampling errors, such as non-response and measurement errors, on the estimation of population variance when they are present together, and pro poses methods for estimating the finite population variance. For simple random sampling, stratified sampling, and double sampling, corresponding finite popu lation variance estimators were constructed. Furthermore, under the first-order approximation condition, expressions for the bias and mean squared error of these estimators are derived, and the advantages and effectiveness of the proposed meth ods in terms of statistical properties relative to existing estimation methods are evaluated.

Fourth, calibration estimation methods are integrated to address the issue of outliers in sampling data, thereby improving the precision of the population variance estimation. This study applies calibration estimation methods to the es timation of the finite population variance to achieve more accurate results. The core of this method is to adjust weights using auxiliary information in the sample, with the goal of reducing the negative impact of errors on the estimation of the f inite population variance. For simple random sampling, stratified sampling, and double sampling, the proposed variance estimators are calibrated separately to obtain calibration estimators for the finite population variance in the presence of non-response errors, measurement errors, and their combined effects. For simple random sampling, sample weights were adjusted based on the association between sample data and auxiliary variables. In stratified random sampling, this process is further refined by assigning calibrated weights to sample units within each stra tum to ensure more precise intra-stratum and inter-stratum variance estimates. For scenarios where population auxiliary information is unknown, this study sys tematically analyzes the calibration estimation of the finite population variance under double sampling when errors are present together. Under these three sam pling designs, the study not only proposes the calibrated variance estimators but also calculates the bias and MSE of these estimators and compares them with ex isting methods to assess the performance of the calibrated estimators in terms of accuracy, efficiency, and robustness. Finally, through simulation experiments and practical data applications, the effectiveness of the proposed calibration estimation methods was verified.

The research results indicate that the variance estimators proposed in this study can provide more accurate and reliable estimation results when dealing with sampling errors and non-sampling errors, especially in the presence of non response errors and measurement errors. This finding provides strong technical support and theoretical foundations for improving the accuracy and reliability of the finite population variance estimation, and at the same time, it has important theoretical and practical implications for enhancing the quality of data analysis and decision support in sampling surveys.

学位类型博士
答辩日期2024-12-14
学位授予地点甘肃省兰州市
语种中文
论文总页数203
参考文献总数12
馆藏号D00013
保密级别公开
中图分类号C8/13
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/38922
专题统计与数据科学学院
推荐引用方式
GB/T 7714
牛玺娟. 误差叠加条件下有限总体方差的校准估计研究[D]. 甘肃省兰州市. 兰州财经大学,2024.
条目包含的文件 下载所有文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
10741-2020071400007-(6676KB)学位论文 开放获取CC BY-NC-SA浏览 下载
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[牛玺娟]的文章
百度学术
百度学术中相似的文章
[牛玺娟]的文章
必应学术
必应学术中相似的文章
[牛玺娟]的文章
相关权益政策
暂无数据
收藏/分享
文件名: 10741-2020071400007-LW.pdf
格式: Adobe PDF
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。