首页> 美国卫生研究院文献>Nucleic Acids Research >Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches
【2h】

Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches

机译:隔开的单词和kmac:基于不精确单词匹配的快速无比对序列比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing ‘don't care’ or ‘wildcard’ symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at ‘Göttingen Bioinformatics Compute Server (GOBICS)’: and the source codes can be downloaded.
机译:在本文中,我们为最近开发的两种无比对的序列比较方法提供了一个用户友好的Web界面。大多数无对齐方法都依赖于精确的单词匹配来估计输入序列之间的成对相似度或距离。相比之下,我们的新算法基于不精确的单词匹配。这些方法中的第一种使用输入序列中所谓间隔词的相对频率,即在某些预定义位置包含“无关”或“通配符”符号的词。然后可以基于序列的不同间隔词组成,在序列上定义各种距离度量。我们的第二种方法是通过估计第一个序列中的每个位置在此位置的最长子串的长度来定义两个序列之间的距离,该最长子串的长度也出现在第二个序列中,最多有k个不匹配。两种方法都采用一组脱氧核糖核酸(DNA)或蛋白质序列作为输入,并返回成对的距离值矩阵,该矩阵可以用作聚类算法或基于距离的系统发育重建的起点。可通过“哥廷根生物信息学计算服务器(GOBICS)”上的Web界面访问这两个无对齐程序:并可以下载源代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号