首页> 外文期刊>Procedia Computer Science >Scalable Information Gain Variant on Spark Cluster for Rapid Quantification of Microarray
【24h】

Scalable Information Gain Variant on Spark Cluster for Rapid Quantification of Microarray

机译:用于快速量化微阵列的Spark簇上的可扩展信息增益变量

获取原文
           

摘要

Microarray technology is one of the emerging technologies in the field of genetic research, which many researchers often use to monitor expression levels of genes in a given organism. Microarray experiments have wide range of applications in health care sector. The colossal amount of raw gene expression data often leads to computational and analytical challenges including feature selection and classification of the dataset into correct group or class. In this paper, mutual information feature selection method based on spark framework (sf-MIFS) is proposed to determine the pertinent features. After completion of feature selection process, various classifiers i.e., Logistic Regression (sf-LoR) and Naive Bayes (sf-NB) based on Spark framework has been applied to classify the microarray datasets. A detailed comparative analysis in terms of execution time and accuracy is enumerated on the proposed feature selection and classifier methodologies, based on Spark framework and conventional system respectively.
机译:微阵列技术是遗传研究领域中的新兴技术之一,许多研究人员经常使用微阵列技术来监测给定生物体中基因的表达水平。微阵列实验在医疗保健领域具有广泛的应用。大量原始基因表达数据通常会导致计算和分析难题,包括特征选择和将数据集分类为正确的组或类。提出了一种基于Spark框架的互信息特征选择方法(sf-MIFS),用于确定相关特征。在完成特征选择过程之后,基于Spark框架的各种分类器,即逻辑回归(sf-LoR)和朴素贝叶斯(sf-NB)已经被用于对微阵列数据集进行分类。分别基于Spark框架和常规系统,对提出的特征选择和分类器方法进行了详细的比较分析,分析了执行时间和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号