首页> 外文学位 >An application of the EM algorithm in analyzing the CUNY open-admissions study missing data.
【24h】

An application of the EM algorithm in analyzing the CUNY open-admissions study missing data.

机译:EM算法在分析CUNY公开招生研究缺失数据中的应用。

获取原文
获取原文并翻译 | 示例

摘要

The present study is based on an analysis of a sample from the CUNY open-admissions data set. The data set consisted of two portions, an original sample and a follow-up sample which contained only 14% of the original cases. Not only were data missing for those cases not in the follow-up sample, but the original sample variables were not completely observed. The data set is basically multivariate with both incomplete continuous and categorical variables. In analyzing such a data set, many researchers typically use ad hoc approaches that lack theoretical bases. For example, deletion or substitution methods are offered as a routine treatment for missing values before performing an analysis in many statistical packages.;It is important to note that deletion methods using only respondents with no missing values may yield biased results, unless the complete cases can be viewed as a completely random subsample of the original sample observations. A more realistic approach is to assume that the missing data are not missing in a completely random fashion, but rather are missing at random as a function of known subject characteristics. Further, given this more realistic assumption concerning the missing data process, one could apply Maximum Likelihood methods to estimate the parameters of interest. The Maximum Likelihood method was used in the present study.;In this study, the Maximum Likelihood estimates for means, variances, and correlations were obtained by implementing the Estimation-Maximization (EM) algorithm suggested by Little & Schulucter (1985). These Maximum Likelihood estimates were compared with the estimates obtained from three different ad hoc methods; Pairwise deletion, Listwise deletion, and Weighting analyses.;Although the results show some differences in terms of correlation estimates, there was little evidence that the methods yield different estimates of proportions, means and standard deviations. Possible explanations for this result are discussed. In general, however, the ad hoc and Maximum Likelihood methods will not agree.
机译:本研究基于对CUNY开放入学数据集样本的分析。数据集包括两个部分,一个原始样本和一个后续样本,仅包含原始病例的14%。后续样本中不仅没有丢失那些病例的数据,而且原始样本变量也没有被完全观察到。数据集基本上是多变量的,具有不完整的连续变量和分类变量。在分析此类数据集时,许多研究人员通常使用缺乏理论基础的临时方法。例如,在许多统计数据包中进行分析之前,提供删除或替代方法作为缺失值的常规处理方法;;重要的是要注意,除非使用完整的案例,否则仅使用无缺失值的受访者的缺失方法可能会产生偏差的结果可以视为原始样本观测值的完全随机子样本。一种更现实的方法是假定丢失的数据不是完全随机地丢失,而是根据已知主题特征随机丢失。此外,考虑到关于丢失数据过程的更现实的假设,可以应用最大似然法来估计感兴趣的参数。本研究中使用了最大似然法;在本研究中,通过实施Little&Schulucter(1985)提出的估计-最大化(EM)算法获得了均值,方差和相关性的最大似然估计。将这些最大似然估计值与从三种不同的临时方法获得的估计值进行比较。逐对删除,按列表删除和权重分析。尽管结果在相关性估计方面显示出一些差异,但几乎没有证据表明这些方法对比例,均值和标准差产生了不同的估计。讨论了此结果的可能解释。但是,一般而言,临时和最大可能性方法不会达成共识。

著录项

  • 作者

    Na, Hazon.;

  • 作者单位

    City University of New York.;

  • 授予单位 City University of New York.;
  • 学科 Educational psychology.;Educational tests measurements.;Statistics.
  • 学位 Ph.D.
  • 年度 1992
  • 页码 97 p.
  • 总页数 97
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号