...
首页> 外文期刊>PLoS Computational Biology >DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions
【24h】

DeepG4: A deep learning approach to predict cell-type specific active G-quadruplex regions

机译:DeepG4:一种预测细胞类型特定活动G-Quadruplex区域的深度学习方法

获取原文

摘要

DNA is a complex molecule carrying the instructions an organism needs to develop, live and reproduce. In 1953, Watson and Crick discovered that DNA is composed of two chains forming a double-helix. Later on, other structures of DNA were discovered and shown to play important roles in the cell, in particular G-quadruplex (G4). Following genome sequencing, several bioinformatic algorithms were developed to map G4s in vitro based on a canonical sequence motif, G-richness and G-skewness or alternatively sequence features including k-mers, and more recently machine/deep learning. Recently, new sequencing techniques were developed to map G4s in vitro (G4-seq) and G4s in vivo (G4 ChIP-seq) at few hundred base resolution. Here, we propose a novel convolutional neural network (DeepG4) to map cell-type specific active G4 regions (e.g. regions within which G4s form both in vitro and in vivo). DeepG4 is very accurate to predict active G4 regions in different cell types. Moreover, DeepG4 identifies key DNA motifs that are predictive of G4 region activity. We found that such motifs do not follow a very flexible sequence pattern as current algorithms seek for. Instead, active G4 regions are determined by numerous specific motifs. Moreover, among those motifs, we identified known transcription factors (TFs) which could play important roles in G4 activity by contributing either directly to G4 structures themselves or indirectly by participating in G4 formation in the vicinity. In addition, we used DeepG4 to predict active G4 regions in a large number of tissues and cancers, thereby providing a comprehensive resource for researchers. Availability: https://github.com/morphos30/DeepG4.
机译:DNA是一种复杂的分子,携带有机体需要发展,生活和繁殖的有机体。 1953年,沃森和克里克发现DNA由两条链组成,形成双螺旋。后来,发现了DNA的其他结构,并显示在细胞中起重要作用,特别是G-QuadRuple(G4)。在基因组测序之后,基于规范序列基序,G-RINDNES和G-Skewness,G-Richness和G-Skewness或可选的序列特征,开发了几种生物信息化算法以在体外映射G4s,包括K-MERS,以及最近的机器/深度学习。最近,开发了新的测序技术,以在少量百碱基分辨率下在体外(G4-SEQ)和G4s中的G4s和G4s中的G4s映射。这里,我们提出了一种新颖的卷积神经网络(Deepg4)来映射细胞型特异性活性G4区(例如,G4s在体外和体内形成G4的区域)。 DeepG4非常准确,以预测不同细胞类型的活性G4区域。此外,DeepG4识别是预测G4区活性的关键DNA基序。我们发现,随着当前算法寻求的,此类图案不会遵循非常灵活的序列模式。相反,有源G4区域由许多特定的主题决定。此外,在那些基序中,我们确定了已知的转录因子(TFS),其通过通过参与附近的G4形成,通过直接促进G4结构或间接地通过参与G4结构来发挥重要作用。此外,我们使用DeepG4在大量组织和癌症中预测活性G4区,从而为研究人员提供全面的资源。可用性:https://github.com/morphos30/deepg4。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号