...
首页> 外文期刊>Journal of Molecular Biology >Dynamic sequence databank searching with templates and multiple alignment
【24h】

Dynamic sequence databank searching with templates and multiple alignment

机译:具有模板和多重比对的动态序列数据库搜索

获取原文
获取原文并翻译 | 示例

摘要

Sequence databank searches are often performed iteratively, taking the results of a search to form a probe (either a pattern or profile) for a subsequent scan of the databank. The advantage of this approach is that, as more sequences are drawn into the probe, it should, in principle be possible to detect increasingly distant members of the family. This approach works well when supervised by an "expert" who has a good "eye" for the quality of the sequence alignment and whether novel matches should be rejected or incorporated into the probe. However, all attempts to automate the process have proved difficult, as the process is inherently unstable. Errors in the alignment, or the misalignment of a non-family member, lead to a deterioration of the probe specificity, so allowing further incorrect sequences to be identified. Here, a combination of two methods is used to provide a check on such instability. A pattern matching (template) search method is used (with a BLAST-like pre-filter for speed) to return sequence segments for alignment in a standard multiple alignment program (MULTAL). Sequences are aligned only to a fixed limit of similarity and any sequences or sub-families that have not joined the original "seed" family are rejected. The remaining core family then provides the basis for a subsequent pattern derivation and databank search. The constant check by the multiple alignment phase allows the search phase to be pushed continually towards the boundary of similarity. This is maintained by lowering the cutoff on the scores of acceptable sequences each time the family remains the same over successive search cycles. The procedure was observed to be stable under misalignments and to have an ability to recognise distantly related family members across super-families that was comparable to Psi-BLAST. The method is applied to the analysis of the hormone-binding domains of the insulin and related growth-factor receptors. (C) 1998 Academic Press. [References: 24]
机译:序列数据库搜索通常以迭代方式执行,将搜索结果作为探针(模式或配置文件)以进行数据库的后续扫描。这种方法的优点是,随着更多的序列被引入探针中,原则上应该可以检测到家族中越来越远的成员。在“专家”的监督下,这种方法效果很好,“专家”对序列比对的质量以及是否应该拒绝新颖的匹配或将其引入探针都具有良好的“目光”。然而,由于该过程本质上是不稳定的,因此所有试图使该过程自动化的尝试都被证明是困难的。比对错误或非家族成员的错位会导致探针特异性下降,因此可以鉴定出其他不正确的序列。这里,两种方法的组合用于检查这种不稳定性。使用模式匹配(模板)搜索方法(带有类似BLAST的预过滤器以提高速度)来返回序列段,以在标准多重比对程序(MULTAL)中进行比对。仅将序列比对固定的相似性极限,并且不加入原始“种子”家族的任何序列或亚家族都将被拒绝。然后,剩余的核心系列将为后续的模式推导和数据库搜索提供基础。多重比对阶段的不断检查允许将搜索阶段连续推向相似性边界。每当族在连续的搜索周期中保持相同时,就通过降低可接受序列分数的截止值来维持这一点。据观察,该程序在错位情况下是稳定的,并且具有识别超家族中远亲家族成员的能力,该能力与Psi-BLAST相当。该方法适用于胰岛素和相关生长因子受体的激素结合域的分析。 (C)1998年学术出版社。 [参考:24]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号