首页> 外文期刊>BMC Bioinformatics >A machine learning strategy to identify candidate binding sites in human protein-coding sequence
【24h】

A machine learning strategy to identify candidate binding sites in human protein-coding sequence

机译:一种机器学习策略,用于识别人蛋白质编码序列中的候选结合位点

获取原文
       

摘要

Background The splicing of RNA transcripts is thought to be partly promoted and regulated by sequences embedded within exons. Known sequences include binding sites for SR proteins, which are thought to mediate interactions between splicing factors bound to the 5' and 3' splice sites. It would be useful to identify further candidate sequences, however identifying them computationally is hard since exon sequences are also constrained by their functional role in coding for proteins. Results This strategy identified a collection of motifs including several previously reported splice enhancer elements. Although only trained on coding exons, the model discriminates both coding and non-coding exons from intragenic sequence. Conclusion We have trained a computational model able to detect signals in coding exons which seem to be orthogonal to the sequences' primary function of coding for proteins. We believe that many of the motifs detected here represent binding sites for both previously unrecognized proteins which influence RNA splicing as well as other regulatory elements.
机译:背景技术RNA转录物的剪接被认为是部分受嵌入在外显子中的序列促进和调节的。已知序列包括SR蛋白的结合位点,据认为该位点可介导与5'和3'剪接位点结合的剪接因子之间的相互作用。鉴定另外的候选序列将是有用的,但是由于外显子序列也受它们在编码蛋白质中的功能作用的限制,因此在计算上鉴定它们是困难的。结果该策略鉴定了包括多个先前报道的剪接增强子元件的基序集合。尽管仅对编码外显子进行了训练,但是该模型从基因内序列中区分了编码外显子和非编码外显子。结论我们已经训练了一个能够检测编码外显子信号的计算模型,该外显子似乎与编码蛋白质的序列的主要功能正交。我们相信,这里检测到的许多基序代表了以前无法识别的蛋白的结合位点,这些蛋白会影响RNA剪接以及其他调控元件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号