首页> 外文期刊>BMC Bioinformatics >IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence
【24h】

IdentiCS – Identification of coding sequence and in silico reconstruction of the metabolic network directly from unannotated low-coverage bacterial genome sequence

机译:IdentiCS –直接从未注释的低覆盖率细菌基因组序列中识别编码序列并在计算机上重建代谢网络

获取原文
获取外文期刊封面目录资料

摘要

Background A necessary step for a genome level analysis of the cellular metabolism is the in silico reconstruction of the metabolic network from genome sequences. The available methods are mainly based on the annotation of genome sequences including two successive steps, the prediction of coding sequences (CDS) and their function assignment. The annotation process takes time. The available methods often encounter difficulties when dealing with unfinished error-containing genomic sequence. Results In this work a fast method is proposed to use unannotated genome sequence for predicting CDSs and for an in silico reconstruction of metabolic networks. Instead of using predicted genes or CDSs to query public databases, entries from public DNA or protein databases are used as queries to search a local database of the unannotated genome sequence to predict CDSs. Functions are assigned to the predicted CDSs simultaneously. The well-annotated genome of Salmonella typhimurium LT2 is used as an example to demonstrate the applicability of the method. 97.7% of the CDSs in the original annotation are correctly identified. The use of SWISS-PROT-TrEMBL databases resulted in an identification of 98.9% of CDSs that have EC-numbers in the published annotation. Furthermore, two versions of sequences of the bacterium Klebsiella pneumoniae with different genome coverage (3.9 and 7.9 fold, respectively) are examined. The results suggest that a 3.9-fold coverage of the bacterial genome could be sufficiently used for the in silico reconstruction of the metabolic network. Compared to other gene finding methods such as CRITICA our method is more suitable for exploiting sequences of low genome coverage. Based on the new method, a program called IdentiCS ( Identi fication of C oding S equences from Unfinished Genome Sequences) is delivered that combines the identification of CDSs with the reconstruction, comparison and visualization of metabolic networks (free to download at http://genome.gbf.de/bioinformatics/index.html ). Conclusions The reversed querying process and the program IdentiCS allow a fast and adequate prediction protein coding sequences and reconstruction of the potential metabolic network from low coverage genome sequences of bacteria. The new method can accelerate the use of genomic data for studying cellular metabolism.
机译:背景技术用于细胞代谢的基因组水平分析的必要步骤是从基因组序列对代谢网络进行计算机模拟重建。可用的方法主要基于对基因组序列的注释,包括两个连续步骤,即编码序列的预测(CDS)及其功能分配。注释过程需要时间。在处理未完成的,包含错误的基因组序列时,可用的方法经常遇到困难。结果在这项工作中,提出了一种快速的方法,该方法使用未注释的基因组序列来预测CDS和计算机合成代谢网络。不是使用预测的基因或CDS来查询公共数据库,而是使用来自公共DNA或蛋白质数据库的条目作为查询来搜索未注释的基因组序列的本地数据库以预测CDS。同时将功能分配给预测的CDS。以鼠伤寒沙门氏菌LT2的注释良好的基因组为例,说明该方法的适用性。正确标识了原始批注中97.7%的CDS。通过使用SWISS-PROT-TrEMBL数据库,可以识别出98.9%的CDS具有已发布注释中的EC编号。此外,检查了具有不同基因组覆盖率(分别为3.9和7.9倍)的肺炎克雷伯菌细菌的两种序列形式。结果表明,细菌基因组的3.9倍覆盖率可以充分用于代谢网络的计算机重建。与其他基因发现方法(例如CRITICA)相比,我们的方法更适合于利用低基因组覆盖率的序列。基于该新方法,提供了一个名为IdentiCS(未完成的基因组序列中编码序列的识别)的程序,该程序将CDS的识别与代谢网络的重建,比较和可视化结合在一起(可从http://www.microsoft.com/en-us/library/blog.aspx?基因组(gbf.de/bioinformatics/index.html)。结论反向查询过程和程序IdentiCS可以快速,适当地预测蛋白质编码序列,并能从低覆盖率的细菌基因组序列重建潜在的代谢网络。这种新方法可以加快基因组数据用于研究细胞代谢的速度。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号