首页> 外文会议>International Conference on "Computational intelligence in Data Mining" >Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark
【24h】

Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark

机译:通过平行基因选择和火花分类预测微阵列基因表达数据的儿童肿瘤

获取原文

摘要

Microarray gene expression data play a major role in predicting chronic disease at an early stage. It also helps to identify the most appropriate drug for curing the disease. Such microarray gene expression data is huge in volume to handle. All gene expressions are not necessary to predict a disease. Gene selection approaches pick only genes that play a prominent role in detecting a disease and drug for the same. In order to handle huge gene expression data, gene selection algorithms can be executed in parallel programming frameworks such as Hadoop Mapreduce and Spark. Paediatric cancer is a threatening illness that affects children at age of 0-14 years. It is very much necessary to identify child tumours at early stage to save the lives of children. So the authors investigate on paediatric cancer gene data to identify the optimal genes that cause cancer in children. The authors propose to execute parallel Chi-Square gene selection algorithm on Spark, selected genes are evaluated using parallel logistic regression and support vector machine (SVM) for Binary classification on Spark Machine Learning library (Spark MLlib) and compare the accuracy of prediction and classification respectively. The results show that parallel Chi-Square selection followed by parallel logistic regression and SVM provide better accuracy compared to accuracy obtained with complete set of gene expression data.
机译:微阵列基因表达数据在早期预测慢性疾病中起主要作用。它还有助于确定最适合治愈疾病的药物。此类微阵列基因表达数据在巨大的体积中处理。所有基因表达都没有必要预测疾病。基因选择方法仅挑选在检测同样的疾病和药物方面发挥着突出作用的基因。为了处理巨大的基因表达数据,可以在并行编程框架中执行基因选择算法,例如Hadoop MakReduce和Spark。儿科癌症是一种威胁疾病,影响儿童在0-14岁时。在早期阶段鉴定儿童肿瘤是非常有必要的,以挽救儿童的生命。因此,作者调查儿科癌症基因数据,以确定导致儿童癌症的最佳基因。作者提出了在火花上执行并行Chi-Square基因选择算法,使用并行物流回归和支持向量机(SVM)对Spark Machine学习库(Spark Mllib)进行二进制分类来评估所选基因,并比较预测和分类的准确性分别。结果表明,与用完整组基因表达数据获得的准确性相比,平行的Chi-Square选择随后是平行逻辑回归和SVM提供更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号