首页> 外文期刊>NanoBioscience, IEEE Transactions on >Discovering Patterns From Sequences Using Pattern-Directed Aligned Pattern Clustering
【24h】

Discovering Patterns From Sequences Using Pattern-Directed Aligned Pattern Clustering

机译:使用模式定向的对齐模式聚类从序列中发现模式

获取原文
获取原文并翻译 | 示例
           

摘要

Functional region identification is of fundamental importance for protein sequences analysis. Such knowledge provides better scientific understanding and could assist drug discovery. Up-to-date, domain annotation is one approach, but it needs to leverage existing databases. For de novo discovery, motif discovery locates and aligns locally homologous sub-sequences to obtain a position-weight matrix (PWM), which is a fixed-length representation model, whereas protein functional region size varies. It thus requires computational expensive exhaustive search to obtain a PWM with width of optimal range. This paper presents a new method known as pattern-directed aligned pattern clustering (PD-APCn) to discover and align patterns in conserved protein functional regions. It adopts aligned pattern cluster (APC) with patterns of variable length and strong support to direct the incremental APC expansion. It allows substitution and frame-shift mutations until a robust termination condition is reached. The concept of breakpoint gap is introduced to identify spots of mutations, such as substitution and frame shifts. Experiments on synthetic data sets with different sizes and noise levels showed that PD-APCn outperforms MEME with much higher recall and Fmeasure and computational speed 665 times faster that MEME. When applying to Cytochrome C and Ubiquitin families, it found all key binding sites within the APCs.
机译:功能区识别对于蛋白质序列分析至关重要。此类知识可提供更好的科学理解,并有助于发现药物。最新的域注释是一种方法,但是它需要利用现有数据库。对于从头发现,基序发现定位并对齐局部同源的子序列,以获得位置权重矩阵(PWM),这是一个固定长度的表示模型,而蛋白质功能区的大小却有所不同。因此,为了获得具有最佳范围宽度的PWM,需要计算上昂贵的穷举搜索。本文提出了一种称为模式定向比对模式聚类(PD-APCn)的新方法,用于发现和对齐保守蛋白功能区中的模式。它采用具有可变长度和强大支持的对齐模式集群(APC)来指导增量APC扩展。它允许替换和移码突变,直到达到可靠的终止条件为止。引入断点间隙的概念来识别突变点,例如取代和移码。对具有不同大小和噪声水平的合成数据集进行的实验表明,PD-APCn的性能优于MEME,召回率和Fmeasure更高,计算速度比MEME快665倍。当应用于细胞色素C和泛素家族时,它发现了APC中的所有关键结合位点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号