首页> 外文会议>International Conference on IT Convergence and Security >An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data
【24h】

An Improved SVM-T-RFE Based on Intensity-Dependent Normalization for Feature Selection in Gene Expression of Big-Data

机译:基于强度依赖性标准化的改进的SVM-T-RFE,用于大数据的基因表达中的特征选择

获取原文

摘要

Thanks to Next-Generation-Sequencing (NGS) revolutionary, high-throughput RNA sequencing data (RNA-seq) has become a highly sensitive and accurate method of measuring gene expression. Since RNA-seq generate a huge amount of data they have been struggling to overcome the lack of computational methods to exploit the enormous RNA-seq Big-Data. In most of cases, those methods have not been adequate for feature scaling scheme on RNA-seq Big-Data. So, RNA-seq encourages computational biologist to identify both novel and well-known features, although it have led to an increase in an adoption of previous methods and development of newly scalable data analysis ones. And it provides recognition of some deep learning methods which are scalable and adaptable for assuming and selecting the highly correlated genes for classification and prediction. However, some assumption of those methods have not been always correct and they have been considered unstable in terms of large-scale gene expression profiling. Therefore we propose improved feature selection technique of well-known support vector machine recursive feature elimination (SVM-RFE) with T-Statistics based on Intensity-dependent normalization, which uses log differential expression ratio (M vs A plot) for improving scalability. In each iteration of SVM-RFE, less dominated feature set with respect to relevance and redundancy is excluded from this set of features. In the proposed algorithm, the most relevant and less redundant feature is included in the final feature set, accomplishing comparable accuracy with a small subsets of Big-Data, such as NCBI-GEO. The proposed algorithm is compared with the existing one on several known data. It finds that the proposed algorithm have become convenient and quick than previous because it uses all functions in R package and have more improvement with regard to the time consuming in terms of Big-Data.
机译:由于下一代测序(NGS)革命性,高通量RNA测序数据(RNA-SEQ)已成为测量基因表达的高度敏感和准确的方法。由于RNA-SEQ产生了大量数据,他们一直在努力克服缺乏计算方法来利用巨大的RNA-SEQ大数据。在大多数情况下,这些方法对于RNA-SEQ大数据的特征缩放方案没有足够的特征。因此,RNA-SEQ鼓励计算生物学家识别既有新颖且众所周知的特征,尽管它导致采用先前的方法和新可扩展数据分析的发展的增加。它提供了对一些深度学习方法的识别,其可伸缩,并且适应用于假设和选择用于分类和预测的高度相关基因。然而,这些方法的某些假设并不始终是正确的,并且在大规模基因表达分析方面被认为是不稳定的。因此,我们提出了具有基于强度依赖性归一化的T统计的众所周知的支持向量机递归特征消除(SVM-RFE)的特征选择技术,其使用日志差异表达比(M VS绘图)来提高可扩展性。在SVM-RFE的每次迭代中,从该组功能中排除了相对于相关性和冗余的较少主导的特征。在所提出的算法中,最相关且较少的冗余功能包括在最终功能集中,实现了与大数据的小亚组,例如NCBI-Geo的相当准确性。将该算法与现有的算法进行比较,在几个已知数据上。它发现,所提出的算法比以前变得方便快速,因为它使用R包装中的所有功能,并且在大数据方面的耗时方面具有更多的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号