An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data

机译：基于强度依赖性标准化的改进的SVM-T-RFE，用于大数据的基因表达中的特征选择

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Thanks to Next-Generation-Sequencing (NGS) revolutionary, high-throughput RNA sequencing data (RNA-seq) has become a highly sensitive and accurate method of measuring gene expression. Since RNA-seq generate a huge amount of data they have been struggling to overcome the lack of computational methods to exploit the enormous RNA-seq Big-Data. In most of cases, those methods have not been adequate for feature scaling scheme on RNA-seq Big-Data. So, RNA-seq encourages computational biologist to identify both novel and well-known features, although it have led to an increase in an adoption of previous methods and development of newly scalable data analysis ones. And it provides recognition of some deep learning methods which are scalable and adaptable for assuming and selecting the highly correlated genes for classification and prediction. However, some assumption of those methods have not been always correct and they have been considered unstable in terms of large-scale gene expression profiling. Therefore we propose improved feature selection technique of well-known support vector machine recursive feature elimination (SVM-RFE) with T-Statistics based on Intensity-dependent normalization, which uses log differential expression ratio (M vs A plot) for improving scalability. In each iteration of SVM-RFE, less dominated feature set with respect to relevance and redundancy is excluded from this set of features. In the proposed algorithm, the most relevant and less redundant feature is included in the final feature set, accomplishing comparable accuracy with a small subsets of Big-Data, such as NCBI-GEO. The proposed algorithm is compared with the existing one on several known data. It finds that the proposed algorithm have become convenient and quick than previous because it uses all functions in R package and have more improvement with regard to the time consuming in terms of Big-Data.

机译：由于下一代测序（NGS）革命性，高通量RNA测序数据（RNA-SEQ）已成为测量基因表达的高度敏感和准确的方法。由于RNA-SEQ产生了大量数据，他们一直在努力克服缺乏计算方法来利用巨大的RNA-SEQ大数据。在大多数情况下，这些方法对于RNA-SEQ大数据的特征缩放方案没有足够的特征。因此，RNA-SEQ鼓励计算生物学家识别既有新颖且众所周知的特征，尽管它导致采用先前的方法和新可扩展数据分析的发展的增加。它提供了对一些深度学习方法的识别，其可伸缩，并且适应用于假设和选择用于分类和预测的高度相关基因。然而，这些方法的某些假设并不始终是正确的，并且在大规模基因表达分析方面被认为是不稳定的。因此，我们提出了具有基于强度依赖性归一化的T统计的众所周知的支持向量机递归特征消除（SVM-RFE）的特征选择技术，其使用日志差异表达比（M VS绘图）来提高可扩展性。在SVM-RFE的每次迭代中，从该组功能中排除了相对于相关性和冗余的较少主导的特征。在所提出的算法中，最相关且较少的冗余功能包括在最终功能集中，实现了与大数据的小亚组，例如NCBI-Geo的相当准确性。将该算法与现有的算法进行比较，在几个已知数据上。它发现，所提出的算法比以前变得方便快速，因为它使用R包装中的所有功能，并且在大数据方面的耗时方面具有更多的改进。

著录项

来源
《International Conference on IT Convergence and Security》|2017年|xix 350 p.|共8页
会议地点
作者
Chayoung Kim; Hye-young Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
Support Vector Machine Recursive Feature Elimination (SVM-RFE); Intensity-dependent normalization (M vs A plot method); T-Statistics; RNA-seq gene expression; Big-Data;

机译：支持向量机递归特征消除（SVM-RFE）;强度依赖性标准化（M VS绘图方法）;T统计;RNA-SEQ基因表达;大数据;

相似文献

外文文献
中文文献
专利

1. Senti-CS: Building a lexical resource for sentiment analysis using subjective feature selection and normalized Chi-Square-based feature weight generation [J] . Khan Farhan Hassan, Qamar Usman, Bashir Saba Expert Systems . 2016,第5期

机译：Senti-CS：使用主观特征选择和标准化的基于卡方的特征权重生成来构建用于情感分析的词汇资源
2. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification [J] . Jain Indu, Jain Vinod Kumar, Jain Renu Applied Soft Computing . 2018,第期

机译：基于相关特征选择的基因选择和癌症分类的改进二元粒子群优化
3. Entropy-based feature selection for improved 3D facial expression recognition - Springer [J] . Kamil Yurtkan, Hasan Demirel Signal, Image and Video Processing . 2014,第2期

机译：基于熵的特征选择可改善3D面部表情识别-Springer
4. An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data [C] . Chayoung Kim, Hye-young Kim International Conference on IT Convergence and Security . 2017

机译：基于强度依赖归一化的改进SVM-T-RFE用于大数据基因表达的特征选择
5. Improving Feature Learning, Feature Selection, and Classification in Facial Expression Analysis [D] . Liu, Ping 2015

机译：改善面部表情分析中的特征学习，特征选择和分类
6. Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection [O] . Silke Szymczak, Angelo Nuzzo, Christian Fuchsberger, 2007

机译：基因表达的遗传关联研究：与标准方差分析比较的基于排列的互信息并且是一种新的特征选择方法
7. Genetic association studies for gene expressions: permutation-based mutual information in a comparison with standard ANOVA and as a novel approach for feature selection [O] . 2007

机译：基因表达的遗传关联研究：与标准方差分析比较的基于排列的互信息，并且是一种新的特征选择方法

An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅