...
首页> 外文期刊>BMC Genomics >Improving protein domain classification for third-generation sequencing reads using deep learning
【24h】

Improving protein domain classification for third-generation sequencing reads using deep learning

机译:使用深度学习改善第三代测序读取的蛋白质结构域分类

获取原文
           

摘要

With the development of third-generation sequencing (TGS) technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce important insights into the biological functions of the underlying data. However, the high error rate in TGS data raises a new challenge to established domain analysis pipelines. The state-of-the-art methods are not optimized for noisy reads and have shown unsatisfactory accuracy of domain classification in TGS data. New computational methods are still needed to improve the performance of domain prediction in long noisy reads. In this work, we introduce ProDOMA, a deep learning model that conducts domain classification for TGS reads. It uses deep neural networks with 3-frame translation encoding to learn conserved features from partially correct translations. In addition, we formulate our problem as an open-set problem and thus our model can reject reads not containing the targeted domains. In the experiments on simulated long reads of protein coding sequences and real TGS reads from the human genome, our model outperforms HMMER and DeepFam on protein domain classification. In summary, ProDOMA is a useful end-to-end protein domain analysis tool for long noisy reads without relying on error correction.
机译:随着第三代测序(TGS)技术的发展,人们能够获得具有10s至100s Kb的长度的DNA序列。这些长读取允许没有组装的蛋白质域注释,因此可以对基础数据的生物学功能产生重要的见解。然而,TGS数据中的高差错率对建立的域分析管道提出了新的挑战。对于嘈杂的读取未优化最先进的方法,并在TGS数据中显示了域分类的不令人满意的准确性。仍然需要新的计算方法来提高长嘈杂读取的域预测的性能。在这项工作中,我们介绍了Prodoma,这是一个对TGS读取的域分类的深度学习模型。它使用具有3帧翻译编码的深神经网络,以便从部分正确的翻译中学习保护功能。此外,我们将我们的问题作为一个开放式问题,因此我们的模型可以拒绝未包含目标域的读取。在模拟长读数的蛋白质编码序列和真实TGS的实验中,我们的模型优于Hmmer和Deepfam对蛋白质结构域分类。总之,Prodoma是一个有用的端到端蛋白质域分析工具,用于长嘈杂读取,而无需依赖纠错。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号