Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark

机译：通过平行基因选择和火花分类预测微阵列基因表达数据的儿童肿瘤

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Microarray gene expression data play a major role in predicting chronic disease at an early stage. It also helps to identify the most appropriate drug for curing the disease. Such microarray gene expression data is huge in volume to handle. All gene expressions are not necessary to predict a disease. Gene selection approaches pick only genes that play a prominent role in detecting a disease and drug for the same. In order to handle huge gene expression data, gene selection algorithms can be executed in parallel programming frameworks such as Hadoop Mapreduce and Spark. Paediatric cancer is a threatening illness that affects children at age of 0-14 years. It is very much necessary to identify child tumours at early stage to save the lives of children. So the authors investigate on paediatric cancer gene data to identify the optimal genes that cause cancer in children. The authors propose to execute parallel Chi-Square gene selection algorithm on Spark, selected genes are evaluated using parallel logistic regression and support vector machine (SVM) for Binary classification on Spark Machine Learning library (Spark MLlib) and compare the accuracy of prediction and classification respectively. The results show that parallel Chi-Square selection followed by parallel logistic regression and SVM provide better accuracy compared to accuracy obtained with complete set of gene expression data.

机译：微阵列基因表达数据在早期预测慢性疾病中起主要作用。它还有助于确定最适合治愈疾病的药物。此类微阵列基因表达数据在巨大的体积中处理。所有基因表达都没有必要预测疾病。基因选择方法仅挑选在检测同样的疾病和药物方面发挥着突出作用的基因。为了处理巨大的基因表达数据，可以在并行编程框架中执行基因选择算法，例如Hadoop MakReduce和Spark。儿科癌症是一种威胁疾病，影响儿童在0-14岁时。在早期阶段鉴定儿童肿瘤是非常有必要的，以挽救儿童的生命。因此，作者调查儿科癌症基因数据，以确定导致儿童癌症的最佳基因。作者提出了在火花上执行并行Chi-Square基因选择算法，使用并行物流回归和支持向量机（SVM）对Spark Machine学习库（Spark Mllib）进行二进制分类来评估所选基因，并比较预测和分类的准确性分别。结果表明，与用完整组基因表达数据获得的准确性相比，平行的Chi-Square选择随后是平行逻辑回归和SVM提供更好的准确性。

著录项

来源
《International Conference on "Computational intelligence in Data Mining"》|2017年|xix 847 p. :|共11页
会议地点
作者
Y.V. Lokeswari; Shomona Gracia Jacob;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词
Parallel gene selection; Chi-Square; Parallel logistic regression; Parallel SVM; Hadoop map reduce; Spark MLlib;

机译：平行基因选择;Chi-Square;并行Logistic回归;并行SVM;Hadoop地图减少;Spark Mllib;

相似文献

外文文献
中文文献
专利

1. 使用cDNA微阵列和组织微阵列对三种上皮性卵巢肿瘤基因表达的分析 [J] . 郑敏, Simon R, Kononen J, 癌症（英文版） . 2004,第007期
2. Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data [J] . Venkataramana Lokeswari, Jacob Shomona Gracia, Ramadoss Rajavel, Genes and genomics . 2019,第11期

机译：使用平行杂交特征选择在微阵列基因表达数据上提高癌症类型的分类准确性
3. Microarray gene-expression data classification using less gene expressions by combining feature selection methods and classifiers [J] . Aarti Bhalla, R. K. Agrawal International Journal of Information Engineering and Electronic Business . 2013,第5期

机译：结合特征选择方法和分类器，使用较少的基因表达进行微阵列基因表达数据分类
4. Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data [J] . Juan A. Gomez-Pulido, Jose L. Cerrada-Barrios, Sebastian Trinidad-Amado, BMC Bioinformatics . 2016,第1期

机译：生物信息学优化问题中适应度函数的细粒度并行化：用于癌症分类的基因选择和基因表达数据的聚类
5. Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark [C] . Y.V. Lokeswari, Shomona Gracia Jacob International Conference on "Computational intelligence in Data Mining" . 2017

机译：通过平行基因选择和火花分类预测微阵列基因表达数据的儿童肿瘤
6. General Penalized Logistic Regression for Gene Selection in High-Dimensional Microarray Data Classification [D] . Bonney, Derrick Kwesi. 2020

机译：高维微阵列数据分类中基因选择的一般惩罚逻辑回归
7. Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data [O] . Juan A. Gomez-Pulido, Jose L. Cerrada-Barrios, Sebastian Trinidad-Amado, 2016

机译：生物信息学优化问题中适应度函数的细粒度并行化：用于癌症分类的基因选择和基因表达数据的聚类
8. Microarray Gene-expression Data Classification using LessGene Expressions by Combining Feature Selection Methods and Classifiers [O] . AartiBhalla, R. K. Agrawal 2013

机译：通过组合特征选择方法和分类器使用LessGene表达式进行微阵列基因表达数据分类

Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark

摘要

著录项

相似文献

相关主题

期刊订阅