首页> 外文期刊>Information and computation >Alignment-free sequence comparison using absent words
【24h】

Alignment-free sequence comparison using absent words

机译:使用无词的无比对序列比较

获取原文
获取原文并翻译 | 示例
           

摘要

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in the sequence. An absent word is minimal if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences. We also present an algorithm that, given a word x of length n, computes the largest integer for which all factors of x of that length occur in some minimal absent word of x in time and space O(n). Finally, we show that the known asymptotic upper bound on the number of minimal absent words of a word is tight. (C) 2018 Elsevier Inc. All rights reserved.
机译:序列比较实际上是所有比较基因组分析的前提。通常通过序列比对技术来实现,这在计算上是昂贵的。这导致人们对无序列比对技术的研究增加了,该技术基于在序列的组成模式上指代序列组成的措施。这些度量(例如q-gram距离)通常在时间上相对于序列的长度呈线性计算。在本文中,我们关注互补的思想:如何基于序列中未出现的信息有效地比较两个序列。如果某个单词不在序列中出现,则表示该单词不在序列中。如果缺席单词的所有适当因素均出现在序列中,则缺席单词最少。在这里,我们提出了第一个线性时间和线性空间算法,通过考虑两个序列的所有最小缺失词来比较两个序列。在此过程中,我们提出了组合兴趣的结果,并扩展了所提出的技术以比较循环序列。我们还提出了一种算法,给定长度为x的单词x,该算法计算出最大整数,该长度为x的所有因子都出现在时间和空间O(n)的最小x最小单词中。最后,我们证明了一个单词的最小缺席单词数量的已知渐近上限是紧密的。 (C)2018 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号