首页> 外文学位 >Deep Learning Models for RNA-Protein Binding
【24h】

Deep Learning Models for RNA-Protein Binding

机译:RNA-蛋白质结合的深度学习模型

获取原文
获取原文并翻译 | 示例

摘要

RNA binding proteins(RBPs) are crucial bio-molecules that fine-tune gene expression in cells. Each RBP prefers to bind to a specific RNA sub-sequence, like a key fitting a lock. Understanding the specific binding preferences of RBPs is an important step to understanding the various steps of gene expression in cells and in solving several genetic disorders. There are thousands of RBPs in humans and only a small fraction of them are well understood. In this work, we develop deep neural network models that allow us to learn binding preferences for a large number of RBPs from high-throughput data, without requiring any specific domain knowledge or feature engineering. Deep learning has improved state of the art in several fields such as image classification, speech recognition, and even genomics. Deep learning approaches obviate the need for careful feature engineering by learning useful representations directly from the data. We propose two deep architectures and use them to predict RNA-protein binding. Based on recent findings that show the importance of RNA secondary structure in RBP binding, we incorporate computationally predicted secondary structure features as input to our models and show its effectiveness in boosting prediction performance. We demonstrate that our models achieve significantly higher correlations on held out in vitro testing data compared to previous approaches. We show that our model can generalize well to in-vivo CLIP-SEQ data and achieve higher median AUCs than other approaches. We demonstrate that our models discover known preferences for proteins such as CPO and VTS1 as well as report other proteins for which we find secondary structure playing an important role in binding. We demonstrate the strengths of our model compared to other approaches such as the ability to combine information from long distances along the sequence input.
机译:RNA结合蛋白(RBPs)是微调细胞中基因表达的重要生物分子。每个RBP都喜欢绑定到特定的RNA子序列,例如适合锁的钥匙。了解RBP的特定结合偏好是了解细胞中基因表达的各个步骤以及解决多种遗传疾病的重要步骤。人体内有成千上万的RBP,但只有一小部分为人们所熟知。在这项工作中,我们开发了深度神经网络模型,使我们可以从高通量数据中学习大量RBP的绑定偏好,而无需任何特定领域的知识或功能工程。深度学习在诸如图像分类,语音识别乃至基因组学等多个领域提高了技术水平。深度学习方法通​​过直接从数据中学习有用的表示形式,从而无需进行仔细的特征工程。我们提出了两种深层结构,并用它们来预测RNA-蛋白质结合。基于显示RNA二级结构在RBP结合中的重要性的最新发现,我们将计算预测的二级结构特征纳入模型输入,并显示其在提高预测性能方面的有效性。我们证明,与以前的方法相比,我们的模型在保留的体外测试数据上实现了更高的相关性。我们表明,我们的模型可以很好地推广到体内CLIP-SEQ数据,并且比其他方法获得更高的中位AUC。我们证明我们的模型发现了对蛋白质(例如CPO和VTS1)的已知偏爱,并报告了我们发现其二级结构在结合中起重要作用的其他蛋白质。与其他方法相比,我们证明了我们模型的优势,例如能够沿序列输入组合远距离信息的能力。

著录项

  • 作者

    Gandhi, Shreshth.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Electrical engineering.;Genetics.;Computer science.;Artificial intelligence.
  • 学位 M.A.S.
  • 年度 2017
  • 页码 31 p.
  • 总页数 31
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号