首页> 外文OA文献 >Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks
【2h】

Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks

机译:使用卷积深度学习神经网络识别原核和真核启动子

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.
机译:启动子的精确计算鉴定仍然是一个挑战,因为这些关键的DNA调控区具有由功能基序组成的可变结构,这些功能基序可提供基因特异性的转录起始。在本文中,我们利用卷积神经网络(CNN)分析原核和真核启动子的序列特征,并建立其预测模型。我们在五个遥远的生物(人类,小鼠,植物(拟南芥)和两种细菌(大肠杆菌和枯草芽孢杆菌))的启动子上训练了相似的CNN结构。我们发现,在大肠杆菌启动子sigma70亚类上受训的CNN对启动子和非启动子序列进行了出色的分类(Sn = 0.90,Sp = 0.96,CC = 0.84)。枯草芽孢杆菌启动子鉴定CNN模型达到Sn = 0.91,Sp = 0.95和CC = 0.86。对于人类,小鼠和拟南芥启动子,我们使用了CNN来鉴定两种众所周知的启动子类别(TATA和非TATA启动子)。 CNN模型很好地识别了这些复杂的功能区域。对于人类启动子,TATA的Sn / Sp / CC预测准确度分别达到0.95 / 0.98 / 0,90,非TATA启动子序列的预测准确度分别达到0.90 / 0.98 / 0.89。对于拟南芥,我们观察到Sn / Sp / CC为0.95 / 0.97 / 0.91(TATA)和0.94 / 0.94 / 0.86(非TATA)启动子。因此,与之前开发的启动子预测程序相比,在CNNProm程序中实现的已开发的CNN模型证明了深度学习方法能够掌握复杂的启动子序列特征并实现更高的准确性。我们还提出随机替代程序,以发现位置保守的启动子功能元件。由于建议的方法不需要了解任何特定的启动子特征,因此可以轻松地扩展它以鉴定许多其他尤其是新测序的基因组序列中的启动子和其他复杂功能区。可以在Web服务器http://www.softberry.com上运行CNNProm程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号