首页> 美国卫生研究院文献>Nucleic Acids Research >ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
【2h】

ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data

机译:ssHMM:从高通量RNA结合蛋白数据中提取直观的序列结构基序

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM’s model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
机译:RNA结合蛋白(RBP)在RNA转录后调控中发挥重要作用,并通过序列结构基序识别靶RNA。在存在或不存在序列基序的情况下,RNA结构影响蛋白质结合的程度仍知之甚少。现有的RNA基序发现者要么仅部分考虑RNA的结构,要么采用不能直接解释为序列结构基序的模型。我们开发了ssHMM,这是一种基于隐马尔可夫模型(HMM)和Gibbs采样的RNA主题查找器,可以完全捕获给定RBP的RNA序列与二级结构偏好之间的关系。与以前的输出顺序和结构的单独徽标的方法相比,当对大量序列进行训练时,它直接产生组合的序列结构主题。 ssHMM的模型可以直观地可视化为图形,并有助于生物学解释。 ssHMM可用于发现未表征的RBP的新颖的真正序列结构基序,例如此处介绍的YY1蛋白。 ssHMM在合成数据上达到很高的图案恢复率,可以从CLIP-Seq数据中恢复已知的RBP图案,并在输入大小上线性缩放,比大型数据集上的MEMERIS和RNAcontext快得多,与GraphProt相当。它可以在Github上免费使用,也可以作为Docker映像使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号