作者夏丽丽
姓名汉语拼音xia li li
学号2016000003091
培养单位兰州财经大学
电话17352230819
电子邮件bjxialili@163.com
入学年份2016
学位类别学术硕士
培养级别硕士研究生
学科门类理学
一级学科名称统计学
学科方向数理统计学
学科代码0714Z3
授予学位理学硕士
第一导师姓名田茂再
第一导师姓名汉语拼音Tian Mao Zai
第一导师单位兰州财经大学
第一导师职称教授
题名一类带约束的零膨胀计数模型的半参数回归分析
英文题名Semiparametric Regression Analysis of a Class of Constrained Zero Inflated Counting Model
关键词带约束的零膨胀计数模型 部分线性可加 半参数回归 大样本性质 EM算法 Monte Carlo模拟
外文关键词Constrained zero inflated count model;Partial linear addition;Semiparametric regression;Large sample property;EM algorithm;Monte Carlo simulation
摘要

在科学研究中,经常会遇到零膨胀计数数据,这类数据的共同点就是,观测 数据的方差变化大于均值,我们通常称这是超散度问题。针对零膨胀计数数据, 当面临一些复杂数据的情况下,比如有时候我们并不清楚某个协变量到底是对零 膨胀部分有影响,还是对分布均值有影响,又或者是对这两者都有影响,目前我 们的解决办法是先结合实际,然后再通过一遍遍的模拟来确定,这是相当麻烦和 不精确的。针对某个协变量对零膨胀率或分布均值有影响的情况,研究起来并不 困难,但当某个协变量对这两部分都有影响并且存在某中关联时,此时再用普通 的零膨胀计数模型进行研究会不切实际,会产生较大的误差。 面对上面的问题,本文研究了某个协变量对这两者都有影响的情况下,假设 非零膨胀率和某种分布均值之间存在某种线性关系,例如捕鱼数据,捕鱼量受各 种因素的影响,比如水的深度和经纬度即影响日平均捕鱼量,又影响非零捕获率, 它们之间很肯存在某种线性关系,此时可以用带约束的零膨胀计数模型进行研究, 约束条件为非零膨胀率和某种分布均值之间存在线性关系。加了约束条件的模型 虽然又引入了新的参数,但相比无约束的情况,模型反而更加简洁,参数个数变 少等优点,目前国内对此类型的数据缺乏系统性的研究,所以本文主要针对带约 束的零膨胀计数数据进行展开。由于非参数回归技术主要是现有估计方法中的局 部估计,必须有足够的数据点才能够得到比较精确的估计结果,与此同时,要想 达到这样的条件,将会面临“维数祸根”,此时比较折中的办法就是运用半参数 回归技术进行数据分析。 本文在连接函数部分用了部分线性可加模型对带约束的零膨胀计数模型进 行回归分析,非参数估计部分用了光滑样条的估计方法,主要是为了克服曲线对 数据点的过度拟合现象。另外,在参数估计过程中,发现如果直接用对数似然函 数进行估计,是相当困难的,因为会存在高微积分,难以求出解析解,但是在存 在缺失数据的情况下,即零膨胀计数数据中的零我们并不清楚是来自分布零还是 结构零,因此引入了 EM 算法,并把非线性估计器和惩罚似然估计结合在一起, 在完全数据下,得到模型的惩罚对数似然函数,然后用 EM 算法和 NewtonRaphson 算法对模型的参数和非参数部分进行估计,并给出了相关大样本性质的 兰州财经大学硕士学位论文 一类带约束的零膨胀计数模型的半参数回归分析 证明。为了使研究的问题更能反映数据特点,本文还分析了不同离散分布情况下 带约束的零膨胀计数模型,可以根据不同的数据特点进行模型选择。最后通过 Monte Carlo 模拟和实例分析,可以验证此方法的有效性。

英文摘要

In scientific research, zero inflated data is often encountered. The commonality of this type of data is that the variance of the observed data changes more than the mean value. We usually call this the problem of super-diffusion. At present, the research on such data has been very comprehensive. For example, for zero inflated data, many scholars have studied the zero-inflated Poisson regression model, the zero-inflated negative binomial regression model, and the non-parametric zero-inflated Poisson regression model. These are only the study of ordinary zeroinflated count data. When faced with some complicated data, for example, sometimes we don't know whether a certain covariate affects the zeroinflated part or the distribution mean value. Or it has an impact on both. At present, our solution is to combine the actual and then through the simulation of one pass, which is quite cumbersome and inaccurate. It is not difficult to study the effect of a certain covariate on the zero inflated rate or the mean value of the distribution. However, when a certain covariate affects both parts and there is a certain correlation, the ordinary zero is used at this time. Studying the expansion count model would be impractical and would result in large errors. In the face of the above problem, this paper studies the effect of a certain covariate on both, assuming a linear relationship between the nonzero expansion rate and a certain distribution mean, such as the price of a 兰州财经大学硕士学位论文 一类带约束的零膨胀计数模型的半参数回归分析 product affects The average daily sales volume, which affects the probability of proportional purchase, and we know that there is a linear relationship between the two, we can use the constrained zero inflated counting model for analysis, the constraint is non-zero inflated rate and some distribution. There is a linear relationship between the means. Although the model with the constraint condition introduces new parameters, compared with the unconstrained case, the model is more concise and the number of parameters is less. At present, there is no systematic research on this type of data in China. The expansion is mainly performed on the zero-inflated count data with constraints. In the connection function part of the model, some scholars have carried out nonparametric regression analysis. Although the nonparametric regression model overcomes the defect of the subjective hypothesis function form of parameter regression, most of the existing estimation methods of this model are local estimates. There must be enough data points to get a more accurate estimation result. At the same time, in order to achieve such a condition, it will face the "dimensionality bane", and the compromise is to use a semi-parametric regression model. In this paper, a partial linear additivity model is used in the connection function to perform regression analysis on the constrained zero-inflated counting model. The non-parametric estimation part uses the smooth spline estimation method. The basic idea of smooth spline regression based on B- 兰州财经大学硕士学位论文 一类带约束的零膨胀计数模型的半参数回归分析 spline is to add the sum of the second-order difference of the spline base function coefficients to the objective function as a penalty, mainly to overcome the over-fitting of the data points by the curve. In addition, in the parameter estimation process, it is found to be quite difficult if the log likelihood function is directly estimated, because there is a high calculus, and it is difficult to find an analytical solution. But in the case of missing data, zero in the zero-inflated count data, we don't know whether it is from distribution zero or structure zero, so we introduce the EM algorithm and combine the nonlinear estimator with the penalty likelihood estimate. A penalty log-likelihood function based on complete data is obtained. Then the parameters and non-parametric parts of the model are estimated by EM algorithm and Newton-Raphson algorithm, and the proof of the properties of the relevant large samples is proved. In order to make the research problem more reflective of the characteristics of the data, this paper also analyzes the zero-inflated counting model with constraints under different discrete distributions (Poisson distribution, generalized Poisson distribution, binomial distribution), which can be modeled according to different data characteristics. select. The effectiveness of this method can be verified by Monte Carlo simulaxtion and case analysis.

学位类型硕士
答辩日期2019-05-25
学位授予地点甘肃省兰州市
研究方向半参数回归
语种中文
论文总页数53
论文印刷版中手工粘贴图片页码0
插图总数0
插表总数0
参考文献总数37
馆藏号0002861
保密级别公开
中图分类号O212/2
保密年限0
文献类型学位论文
条目标识符http://ir.lzufe.edu.cn/handle/39EH0E1M/19234
专题统计与数据科学学院
推荐引用方式
GB/T 7714
夏丽丽. 一类带约束的零膨胀计数模型的半参数回归分析[D]. 甘肃省兰州市. 兰州财经大学,2019.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
35016.pdf(1343KB)学位论文 暂不开放CC BY-NC-SA请求全文
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[夏丽丽]的文章
百度学术
百度学术中相似的文章
[夏丽丽]的文章
必应学术
必应学术中相似的文章
[夏丽丽]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。