首页> 外文会议>International Conference on Materials, Alloys and Experimental Mechanics >A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets
【24h】

A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

机译:用于分类生物医学数据集的新型多级合奏模型

获取原文
获取外文期刊封面目录资料

摘要

This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases.Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
机译:本文主要focuseson开发基于aHadoop框架特征选择和分类模型,在异构的生物医学研究databases.Wide高维数据进行分类已经在机器学习,大数据和数据挖掘等领域进行了识别模式。主要挑战是提取从不同的生物系统产生的有用功能。该提出的模型可用于预测各种应用中的疾病,并鉴定与特定疾病相关的特征。 PubMed和Medline等生物医学储存库的指数增长,准确的预测模型对于Hadoop环境中的知识发现是必不可少的。从非结构化文档中提取关键特征通常会导致由于异常值和缺失值导致的不确定结果。在本文中,我们提出了一种与文本预处理器和分类模型的两相映射减少框架。在第一阶段,设计了基于映射器的预处理方法,以消除生物医学数据中的无关功能,缺少值和异常值。在第二阶段,在预处理的映射器数据中设计并实现了基于Map-Deford的多类集合决策树模型,以提高真正的阳性率和计算时间。复杂生物医学数据集的实验结果表明,我们所提出的基于Hadoop的多级集合模型的性能显着优于最先进的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号