首页> 外文期刊>Genes and genomics >A clustering method for next-generation sequences of bacterial genomes through multiomics data mapping
【24h】

A clustering method for next-generation sequences of bacterial genomes through multiomics data mapping

机译:通过多组学数据映射的下一代细菌基因组序列的聚类方法

获取原文
获取原文并翻译 | 示例
           

摘要

With various 'omics' data becoming available recently, new challenges and opportunities are provided for researches on the assembly of next-generation sequences. As an attempt to utilize novel opportunities, we developed a next-generation sequence clustering method focusing on interdependency between genomics and proteomics data. Under the assumption that we can obtain next-generation read sequences and proteomics data of a target species, we mapped the read sequences against protein sequences and found physically adjacent reads based on a machine learning-based read assignment method. We measured the performance of our method by using simulated read sequences and collected protein sequences of Escherichia coli (E. coli). Here, we concentrated on the actual adjacency of the clustered reads in the E. coli genome and found that (i) the proposed method improves the performance of read clustering and (ii) the use of proteomics data does have a potential for enhancing the performance of genome assemblers. These results demonstrate that the integrative approach is effective for the accurate grouping of adjacent reads in a genome, which will result in a better genome assembly.
机译:随着近来各种“组学”数据的出现,为下一代序列的组装研究提供了新的挑战和机遇。为了利用新的机会,我们开发了下一代序列聚类方法,重点关注基因组学和蛋白质组学数据之间的相互依赖性。在我们可以获得目标物种的下一代阅读序列和蛋白质组学数据的假设下,我们将阅读序列与蛋白质序列进行映射,并基于基于机器学习的阅读分配方法找到了物理上相邻的阅读。我们通过使用大肠杆菌(E. coli)的模拟读取序列和收集的蛋白质序列来测量我们方法的性能。在这里,我们集中研究了大肠杆菌基因组中簇读取的实际邻接关系,发现(i)所提出的方法可以提高簇读取的性能,并且(ii)蛋白质组学数据的使用确实可以提高性能基因组组装者。这些结果表明,整合方法对于基因组中相邻读段的准确分组是有效的,这将导致更好的基因组组装。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号