首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks
【24h】

Probe Efficient Feature Representation of Gapped K-mer Frequency Vectors from Sequences Using Deep Neural Networks

机译:使用深神经网络从序列中探测有效特征表示从序列的序列频率向量

获取原文
获取原文并翻译 | 示例

摘要

Gapped k-mers frequency vectors (gkm-fv) has been presented for extracting sequence features. Coupled with support vector machine (gkm-SVM), gkm-fvs have been used to achieve effective sequence-based predictions. However, the huge computation of a large kernel matrix prevents it from using large amount of data. It is unclear how to combine gkm-fvs with other data sources in the context of string kernel. On the other hand, the high dimensionality, colinearity, and sparsity of gkm-fvs hinder the use of many traditional machine learning methods without a kernel trick. Therefore, we proposed a flexible and scalable framework gkm-DNN to achieve feature representation from high-dimensional gkm-fvs using deep neural networks (DNN). We first proposed a more concise version of gkm-fvs, which significantly reduce the dimension of gkm-fvs. Then, we implemented an efficient method to calculate the gkm-fv of a given sequence at the first time. Finally, we adopted a DNN model with gkm-fvs as inputs to achieve efficient feature representation and a prediction task. Here, we took the transcription factor binding site prediction as an illustrative application and applied gkm-DNN onto 467 small and 69 big human ENCODE ChIP-seq datasets to demonstrate its performance and compared it with the state-of-the-art method gkm-SVM.
机译:已经提出了针对提取序列特征的滤波K-MERS频率向量(GKM-FV)。耦合与支持向量机(GKM-SVM),GKM-FVS已被用于实现有效的基于序列的预测。但是,大型内核矩阵的巨大计算可防止其使用大量数据。目前尚不清楚如何将GKM-FV与其他数据源组合在字符串内核的上下文中。另一方面,GKM-FVS的高维度,离殖度和稀疏性阻碍了许多传统机器学习方法的使用而没有内核伎俩。因此,我们提出了一种灵活且可扩展的框架GKM-DNN,以使用深神经网络(DNN)来实现高维GKM-FV的特征表示。我们首先提出了一个更简洁的GKM-FV版本,这显着降低了GKM-FV的维度。然后,我们实现了一个有效的方法,首次计算给定序列的GKM-FV。最后,我们采用了一个DNN模型,具有GKM-FVS作为输入来实现有效的特征表示和预测任务。在这里,我们将转录因子绑定站点预测作为说明性应用,并应用了GKM-DNN到467个小型和69个大型人编码芯片-SEQ数据集,以展示其性能并将其与最先进的方法进行比较GKM- SVM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号