首页> 外文会议>Latin American symposium on theoretical informatics >Linear-Time Sequence Comparison Using Minimal Absent Words & Applications
【24h】

Linear-Time Sequence Comparison Using Minimal Absent Words & Applications

机译:使用最少的单词的线性时间序列比较及其应用

获取原文

摘要

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in the sequence. An absent word is minimal if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering all their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences.
机译:序列比较实际上是所有比较基因组分析的前提。通常通过序列比对技术来实现,这在计算上是昂贵的。这导致人们对无比对技术的研究不断增加,这种技术是基于以序列的组成模式来指代序列组成的措施。这些度量(例如q-gram距离)通常在时间上相对于序列的长度呈线性计算。在本文中,我们将重点放在互补的概念上:如何根据序列中未出现的信息有效地比较两个序列。如果某个单词不在序列中出现,则它是该序列中不存在的单词。如果缺席单词的所有适当因素均出现在序列中,则缺席单词最少。在这里,我们提出了第一种线性时间和线性空间算法,通过考虑所有序列的最小缺席词来比较两个序列。在此过程中,我们提出了组合兴趣的结果,并扩展了所提出的技术以比较循环序列。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号