首页> 外文期刊>BMC Bioinformatics >JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture
【24h】

JACOP: A simple and robust method for the automated classification of protein sequences with modular architecture

机译:JACOP:一种简单而强大的方法,可通过模块化体系结构对蛋白质序列进行自动分类

获取原文
           

摘要

Background Whole-genome sequencing projects are rapidly producing an enormous number of new sequences. Consequently almost every family of proteins now contains hundreds of members. It has thus become necessary to develop tools, which classify protein sequences automatically and also quickly and reliably. The difficulty of this task is intimately linked to the mechanism by which protein sequences diverge, i.e. by simultaneous residue substitutions, insertions and/or deletions and whole domain reorganisations (duplications/swapping/fusion). Results Here we present a novel approach, which is based on random sampling of sub-sequences (probes) out of a set of input sequences. The probes are compared to the input sequences, after a normalisation step; the results are used to partition the input sequences into homogeneous groups of proteins. In addition, this method provides information on diagnostic parts of the proteins. The performance of this method is challenged by two data sets. The first one contains the sequences of prokaryotic lyases that could be arranged as a multiple sequence alignment. The second one contains all proteins from Swiss-Prot Release 36 with at least one Src homology 2 (SH2) domain – a classical example for proteins with modular architecture. Conclusion The outcome of our method is robust, highly reproducible as shown using bootstrap and resampling validation procedures. The results are essentially coherent with the biology. This method depends solely on well-established publicly available software and algorithms.
机译:背景技术全基因组测序项目正在迅速产生大量新序列。因此,几乎每个蛋白质家族现在都包含数百个成员。因此,有必要开发一种工具,该工具可以自动,快速,可靠地对蛋白质序列进行分类。该任务的困难与蛋白质序列发散的机制密切相关,即通过同时残基取代,插入和/或缺失以及整个域重组(重复/交换/融合)。结果在这里,我们提出了一种新颖的方法,该方法基于一组输入序列中子序列(探针)的随机采样。在归一化步骤之后,将探针与输入序列进行比较;结果用于将输入序列分为同质的蛋白质组。另外,该方法提供了有关蛋白质诊断部分的信息。该方法的性能受到两个数据集的挑战。第一个包含原核裂合酶序列,可以将其排列成多序列比对。第二个包含具有至少一个Src同源性2(SH2)域的Swiss-Prot Release 36的所有蛋白质-具有模块化结构的蛋白质的经典示例。结论如引导程序和重采样验证程序所示,我们方法的结果可靠,可重现。结果与生物学基本一致。此方法仅取决于公认的公开可用软件和算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号