A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

机译：用于分类生物医学数据集的新型多级合奏模型

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases.Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

机译：本文主要focuseson开发基于aHadoop框架特征选择和分类模型，在异构的生物医学研究databases.Wide高维数据进行分类已经在机器学习，大数据和数据挖掘等领域进行了识别模式。主要挑战是提取从不同的生物系统产生的有用功能。该提出的模型可用于预测各种应用中的疾病，并鉴定与特定疾病相关的特征。 PubMed和Medline等生物医学储存库的指数增长，准确的预测模型对于Hadoop环境中的知识发现是必不可少的。从非结构化文档中提取关键特征通常会导致由于异常值和缺失值导致的不确定结果。在本文中，我们提出了一种与文本预处理器和分类模型的两相映射减少框架。在第一阶段，设计了基于映射器的预处理方法，以消除生物医学数据中的无关功能，缺少值和异常值。在第二阶段，在预处理的映射器数据中设计并实现了基于Map-Deford的多类集合决策树模型，以提高真正的阳性率和计算时间。复杂生物医学数据集的实验结果表明，我们所提出的基于Hadoop的多级集合模型的性能显着优于最先进的基线。

著录项

来源
《International Conference on Materials, Alloys and Experimental Mechanics》|2017年|798-1610 p.|共15页
会议地点
作者
ThulasiBikku; DrN Sambasiva Rao; DrAnanda Rao Akepogu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工程材料学;
关键词
Ensemble model; Map-Reduce; Medical databases; Bioinformatics; Textual Decision Patterns.;

机译：集合模型;地图减少;医学数据库;生物信息学;文本决策模式。;

相似文献

外文文献
中文文献
专利

1. A novel multi-class ensemble model based on feature selection using Hadoop framework for classifying imbalanced biomedical data [J] . Thulasi Bikku, N. Sambasiva Rao, Ananda Rao Akepogu International Journal of Business Intelligence and Data Mining . 2019,第1a2期

机译：一种基于特征选择的新型多类集成模型，该模型使用Hadoop框架对不平衡的生物医学数据进行分类
2. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J] . Taherkhani Aboozar, Cosma Georgina, McGinnity T. M. Neurocomputing . 2020,第Sepa3期

机译：adaboost-cnn：卷积神经网络的自适应促进算法，用于使用传输学习对多级不平衡数据集进行分类
3. Classifier Selection and Ensemble Model for Multi-class Imbalance Learning in Education Grants Prediction [J] . Sun Yu, Li Zhanli, Li Xuewen, Applied Artificial Intelligence . 2021,第1a4期

机译：教育补助预测多级不平衡学习的分类器选择和集合模型
4. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets [C] . ThulasiBikku, DrN Sambasiva Rao, DrAnanda Rao Akepogu International Conference on Materials, Alloys and Experimental Mechanics . 2017

机译：用于分类生物医学数据集的新型多级合奏模型
5. Classifier design to improve pattern classification and knowledge discovery for imbalanced datasets. [D] . Wang, Kun. 2009

机译：分类器设计可改进模式分类和不平衡数据集的知识发现。
6. iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets [O] . Jianhua Jia, Zi Liu, Xuan Xiao, 2016

机译：iPPBS-Opt：一种基于序列的集成分类器用于通过优化不平衡训练数据集来识别蛋白质与蛋白质的结合位点
7. iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets [O] . Jianhua Jia, Zi Liu, Xuan Xiao, 2016

机译：ippBs-Opt：基于序列的集成分类器，用于通过优化不平衡训练数据集来识别蛋白质 - 蛋白质结合位点
8. Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset [R] . 2010

机译：多类蚯蚓微阵列数据集分类器基因的鉴定与优化

A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅