A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation

机译：使用聚类和最大似然估计的缺少数据估算方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Missing data is a data mining problem that adversely affects data analysis and decision making processes that are frequently encountered in healthcare data for a variety of reasons. Missing data is still an important research topic because the success of the method is influenced by many factors such as the characteristics of the data and the type of the missing data. In this study, a clustering and maximum likelihood estimation (MLE) based approach to the missing data problem is proposed. In order to test the proposed method, the "Mesothelioma" (Mesothelioma) data set prepared by the Dicle University Medical School and uploaded to UCI international open source database was used. New data sets have been created that are compatible with missing data patterns such as Missing completely at random (MCAR), Missing at random (MAR), and Missing not at random (MNAR). In the second step, these new data sets are divided into clusters in order to increase the computation success of the MLE method by a k-means clustering process in which 3 features with missing data are not included. In the last step, the missing data are completed with the MLE method for these clusters in which the features with missing values are added again, and the clusters are merged to obtain the complete data set. The new data sets obtained as a result of the completed operations in three steps (data reduction, clustering and data completion) were compared with the original data set according to the root mean square error (RMSE) criterion, and an average of 96.5% success was achieved.

机译：缺少数据是数据挖掘问题，其出于各种原因，对医疗数据中经常遇到的数据分析和决策过程产生了不利影响。缺少数据仍然是一个重要的研究主题，因为该方法的成功受到许多因素的影响，例如数据的特征和缺失数据的类型。在本研究中，提出了基于缺失数据问题的基于群集和最大似然估计（MLE）的方法。为了测试所提出的方法，使用了DICE University Medical School准备并上传到UCI国际开源数据库的“间皮瘤”（间皮瘤）数据集。已经创建了与缺失的数据模式兼容的新数据集，例如随机丢失（MCAR），随机（MAR）丢失，并且缺少随机（MNAR）。在第二步中，这些新数据集被划分为群集，以便通过K-Means群集过程增加MLE方法的计算成功，其中不包括具有缺失数据的3个功能。在最后一步中，使用MLE方法完成缺失的数据，用于这些群集的MLE方法，其中再次添加具有缺失值的特征，并且将群集合并以获取完整的数据集。与根据均方根误差（RMSE）标准的原始数据集（RMSE）标准进行了三个步骤（数据减少，群集和数据完成）而获得的新数据集，平均成功为96.5％已实现。

著录项

来源
《Medical Technologies National Congress》|2017年|418p|共4页
会议地点
作者
Muammer ALBAYRAK; Kemal TURHAN; Burcin KURT;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP39-53;
关键词
Missing data; Clustering; MLE; CMLE approach;

机译：缺少数据;聚类;mle;cmle方法;

相似文献

外文文献
中文文献
专利

1. Maximum Likelihood Approach for Longitudinal Models with Nonignorable Missing Data Mechanism Using Fractional Imputation [J] . Abdallah S. A. Yaseen, Ahmed M. Gad, Abeer S. Ahmed American Journal of Applied Mathematics and Statistics . 2016,第3期

机译：分数插补的不可忽略缺失数据机制纵向模型的最大似然法
2. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey [J] . Hugo Peyre, Alain Leplège, Joël Coste Quality of Life Research . 2011,第2期

机译：用于处理生活质量问卷中缺失项目的缺失数据方法。通过对法国2003年十年健康调查中SF-36所使用的个人平均得分，最大信息的最大信息可能性，多次归因和热甲板技术的模拟进行比较
3. Missing data methods for dealing with missing items in quality of life questionnaires. A comparison by simulation of personal mean score, full information maximum likelihood, multiple imputation, and hot deck techniques applied to the SF-36 in the French 2003 decennial health survey. [J] . Peyre H, Leplege A, Coste J Quality of life research: An international journal of quality of life aspects of treatment, care and rehabilitation . 2011,第2期

机译：用于处理生活质量问卷中缺失项目的缺失数据方法。通过对法国2003十年期健康调查中SF-36所使用的个人平均得分，全部信息的最大可能性，多次归因和热甲板技术的模拟进行比较。
4. A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation [C] . Muammer ALBAYRAK, Kemal TURHAN, Burcin KURT Medical Technologies National Congress . 2017

机译：使用聚类和最大似然估计的缺少数据估算方法
5. Maximum likelihood estimation and multiple imputation: A Monte Carlo comparison of modern missing data techniques for multilevel data. [D] . Black, Anne Catherine. 2008

机译：最大似然估计和多重归因：用于多级数据的现代缺失数据技术的蒙特卡洛比较。
6. Addressing Item-Level Missing Data: A Comparison of Proration and Full Information Maximum Likelihood Estimation [O] . Gina L. Mazza, Craig K. Enders, Linda S. Ruehlman -1

机译：处理项目级别的缺失数据：按比例分配和完整信息最大似然估计的比较
7. Maximum likelihood estimation of linear SISO models subject to missing output data and missing input data [O] . Wallin, Ragnar, Hansson, Anders 2014

机译：缺少输出数据和输入数据的线性SISO模型的最大似然估计

A Missing Data Imputation Approach Using Clustering and Maximum Likelihood Estimation

摘要

著录项

相似文献

相关主题

期刊订阅