首页> 外文期刊>Bioinformatics >Fingerprinting protein structures effectively and efficiently.
【24h】

Fingerprinting protein structures effectively and efficiently.

机译:有效和高效地指纹蛋白结构。

获取原文
获取原文并翻译 | 示例
           

摘要

MOTIVATION: One common task in structural biology is to assess the similarities and differences among protein structures. A variety of structure alignment algorithms and programs has been designed and implemented for this purpose. A major drawback with existing structure alignment programs is that they require a large amount of computational time, rendering them infeasible for pairwise alignments on large collections of structures. To overcome this drawback, a fragment alphabet learned from known structures has been introduced. The method, however, considers local similarity only, and therefore occasionally assigns high scores to structures that are similar only in local fragments. METHOD: We propose a novel approach that eliminates false positives, through the comparison of both local and remote similarity, with little compromise in speed. Two kinds of contact libraries (ContactLib) are introduced to fingerprint protein structures effectively and efficiently. Each contact group of the contact library consists of one local or two remote fragments and is represented by a concise vector. These vectors are then indexed and used to calculate a new combined hit-rate score to identify similar protein structures effectively and efficiently. RESULTS: We tested our method on the high-quality protein structure subset of SCOP30 containing 3297 protein structures. For each protein structure of the subset, we retrieved its neighbor protein structures from the rest of the subset. The best area under the Receiver-Operating Characteristic curve, archived by ContactLib, is as high as 0.960. This is a significant improvement compared with 0.747, the best result achieved by FragBag. We also demonstrated that incorporating remote contact information is critical to consistently retrieve accurate neighbor protein structures for all- query protein structures. AVAILABILITY AND IMPLEMENTATION: https://cs.uwaterloo.ca/~xfcui/contactlib/.Registry Number/Name of Substance 0 (Proteins).
机译:动机:结构生物学的一项常见任务是评估蛋白质结构之间的相似性和差异。为此目的,已经设计并实现了多种结构对准算法和程序。现有的结构对齐程序的主要缺点是它们需要大量的计算时间,使得它们无法在大的结构集合上成对对齐。为了克服该缺点,已经引入了从已知结构中学到的片段字母表。但是,该方法仅考虑局部相似性,因此有时会给仅在局部片段中相似的结构分配高分。方法:我们提出了一种新颖的方法,该方法通过比较本地和远程相似性来消除误报,而速度几乎不受影响。引入了两种接触库(ContactLib)来有效地和高效地指纹蛋白结构。联系人库的每个联系人组由一个本地片段或两个远程片段组成,并由简洁的矢量表示。然后将这些载体编入索引,并用于计算新的组合命中率得分,以有效,高效地鉴定相似的蛋白质结构。结果:我们对包含3297个蛋白结构的SCOP30的高质量蛋白结构子集进行了测试。对于子集的每个蛋白质结构,我们从子集的其余部分中检索了其相邻蛋白质结构。 ContactLib存档的“接收器-工作特性”曲线下的最佳区域高达0.960。与0.747(FragBag达到的最佳结果)相比,这是一个重大改进。我们还证明,合并远程联系信息对于始终检索所有查询蛋白结构的准确邻居蛋白结构至关重要。可用性和实现:https://cs.uwaterloo.ca/~xfcui/contactlib/。注册号/物质0的名称(蛋白质)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号