首页> 外文学位 >An application of the EM algorithm in analyzing the CUNY open-admissions study missing data.

【24h】

An application of the EM algorithm in analyzing the CUNY open-admissions study missing data.

机译：EM算法在分析CUNY公开招生研究缺失数据中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The present study is based on an analysis of a sample from the CUNY open-admissions data set. The data set consisted of two portions, an original sample and a follow-up sample which contained only 14% of the original cases. Not only were data missing for those cases not in the follow-up sample, but the original sample variables were not completely observed. The data set is basically multivariate with both incomplete continuous and categorical variables. In analyzing such a data set, many researchers typically use ad hoc approaches that lack theoretical bases. For example, deletion or substitution methods are offered as a routine treatment for missing values before performing an analysis in many statistical packages.;It is important to note that deletion methods using only respondents with no missing values may yield biased results, unless the complete cases can be viewed as a completely random subsample of the original sample observations. A more realistic approach is to assume that the missing data are not missing in a completely random fashion, but rather are missing at random as a function of known subject characteristics. Further, given this more realistic assumption concerning the missing data process, one could apply Maximum Likelihood methods to estimate the parameters of interest. The Maximum Likelihood method was used in the present study.;In this study, the Maximum Likelihood estimates for means, variances, and correlations were obtained by implementing the Estimation-Maximization (EM) algorithm suggested by Little & Schulucter (1985). These Maximum Likelihood estimates were compared with the estimates obtained from three different ad hoc methods; Pairwise deletion, Listwise deletion, and Weighting analyses.;Although the results show some differences in terms of correlation estimates, there was little evidence that the methods yield different estimates of proportions, means and standard deviations. Possible explanations for this result are discussed. In general, however, the ad hoc and Maximum Likelihood methods will not agree.

机译：本研究基于对CUNY开放入学数据集样本的分析。数据集包括两个部分，一个原始样本和一个后续样本，仅包含原始病例的14％。后续样本中不仅没有丢失那些病例的数据，而且原始样本变量也没有被完全观察到。数据集基本上是多变量的，具有不完整的连续变量和分类变量。在分析此类数据集时，许多研究人员通常使用缺乏理论基础的临时方法。例如，在许多统计数据包中进行分析之前，提供删除或替代方法作为缺失值的常规处理方法;;重要的是要注意，除非使用完整的案例，否则仅使用无缺失值的受访者的缺失方法可能会产生偏差的结果可以视为原始样本观测值的完全随机子样本。一种更现实的方法是假定丢失的数据不是完全随机地丢失，而是根据已知主题特征随机丢失。此外，考虑到关于丢失数据过程的更现实的假设，可以应用最大似然法来估计感兴趣的参数。本研究中使用了最大似然法;在本研究中，通过实施Little＆Schulucter（1985）提出的估计-最大化（EM）算法获得了均值，方差和相关性的最大似然估计。将这些最大似然估计值与从三种不同的临时方法获得的估计值进行比较。逐对删除，按列表删除和权重分析。尽管结果在相关性估计方面显示出一些差异，但几乎没有证据表明这些方法对比例，均值和标准差产生了不同的估计。讨论了此结果的可能解释。但是，一般而言，临时和最大可能性方法不会达成共识。

著录项

作者
Na, Hazon.;
展开▼
作者单位

City University of New York.;

展开▼
授予单位 City University of New York.;
学科 Educational psychology.;Educational tests measurements.;Statistics.
学位 Ph.D.
年度 1992
页码 97 p.
总页数 97
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Neural network algorithms for analyzing multidimensional time series for predicting events and their application to study of Sun-Earth relations [J] . S. A. Dolenko, Yu. V. Orlov, I. G. Persiantsev, Pattern recognition and image analysis: advances in mathematical theory and applications in the USSR . 2007,第4期

机译：用于预测事件的多维时间序列的神经网络算法及其在日地关系研究中的应用
2. ITT analysis of randomized encouragement design studies with missing data. [J] . Zhou XH, Li SM Statistics in medicine . 2006,第16期

机译：对缺少数据的随机鼓励设计研究的ITT分析。
3. Strategies for Analyzing Missing Item Response Data with an Application to Lung Cancer [J] . Xiaoming Sheng, K. C. Carrière Biometrical Journal . 2005,第5期

机译：分析缺失项响应数据的策略及其在肺癌中的应用
4. CEM algorithm for imprecise data. Application to flaw diagnosis using acoustic emission [C] . Hamdan, H., Govaert, . 2004

机译：不精确数据的CEM算法。在声发射缺陷诊断中的应用
5. Extension of the Regression Method for Imputation of Data with Monotone Missing Pattern using Multivariate Adaptive Regression Splines (MARS), with Applications to Systematic- Missing-At-Random (SMAR) Study Designs [D] . Lu, Feng. 2013

机译：利用多元自适应回归样条（MARS）扩展单调缺失模式数据插补的回归方法，并应用于系统随机缺失研究（SMAR）研究设计
6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [O] . Yiyan Zhang, Yi Xin, Qin Li, 2017

机译：七种数据挖掘算法在生物医学分类应用中不同数据集特征的实证研究
7. An Improved DINEOF Algorithm for Filling Missing Values in Spatio-Temporal Sea Surface Temperature Data. [O] . Bo Ping, Fenzhen Su, Yunshan Meng 2016

机译：一种改进的DINEOF算法填充时空海温数据缺失值。
8. Application of Machine Learning Algorithms to the Study of Noise Artifacts in Gravitational-Wave Data. [R] . Biswas, R., Blackburn, L. L., Oh, J. J., 2014

机译：机器学习算法在引力波数据中噪声伪影研究中的应用。

An application of the EM algorithm in analyzing the CUNY open-admissions study missing data.

摘要

著录项

相似文献

相关主题

期刊订阅