首页> 外文期刊>Bioinformatics >MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database
【24h】

MinSet: a general approach to derive maximally representative database subsets by using fragment dictionaries and its application to the SCOP database

机译:MinSet:通过使用片段字典派生最大代表性数据库子集的通用方法,并将其应用于SCOP数据库

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: The size of current protein databases is a challenge for many Bioinformatics applications, both in terms of processing speed and information redundancy. It may be therefore desirable to efficiently reduce the database of interest to a maximally representative subset.Results: The MinSet method employs a combination of a Suffix Tree and a Genetic Algorithm for the generation, selection and assessment of database subsets. The approach is generally applicable to any type of string-encoded data, allowing for a drastic reduction of the database size whilst retaining most of the information contained in the original set. We demonstrate the performance of the method on a database of protein domain structures encoded as strings. We used the SCOP40 domain database by translating protein structures into character strings by means of a structural alphabet and by extracting optimized subsets according to an entropy score that is based on a constant-length fragment dictionary. Therefore, optimized subsets are maximally representative for the distribution and range of local structures. Subsets containing only 10% of the SCOP structure classes show a coverage of >90% for fragments of length 1-4.
机译:动机:就处理速度和信息冗余而言,当前蛋白质数据库的规模对于许多生物信息学应用都是一个挑战。因此,可能需要有效地将感兴趣的数据库减少为最大代表性的子集。结果:MinSet方法采用后缀树和遗传算法的组合来生成,选择和评估数据库子集。该方法通常适用于任何类型的字符串编码数据,从而可以大大减小数据库大小,同时保留原始集中包含的大多数信息。我们在编码为字符串的蛋白质结构域数据库上证明了该方法的性能。我们使用SCOP40域数据库,方法是通过结构字母将蛋白质结构翻译成字符串,并根据基于恒定长度片段字典的熵得分提取优化子集。因此,优化的子集最大程度地代表了局部结构的分布和范围。仅包含10%的SCOP结构类别的子集显示长度1-4的片段的覆盖率> 90%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号