首页> 美国卫生研究院文献>Bioinformatics >Fast alignment-free sequence comparison using spaced-word frequencies
【2h】

Fast alignment-free sequence comparison using spaced-word frequencies

机译:使用间隔字频率的快速无序列比对比较

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent.>Results: To reduce the statistical dependency between adjacent word matches, we propose to use ‘spaced words’, defined by patterns of ‘match’ and ‘don’t care’ positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words.>Availability and implementation: Our program is freely available at .>Contact: >Supplementary information: are available at Bioinformatics online.
机译:>动机:用于序列比较的无比对方法越来越多地用于基因组分析和系统发育重建。他们规避了传统的基于对齐方式的各种困难。特别地,无比对方法比成对或多重比对要快得多。但是,它们不如基于序列比对的方法准确。大多数无比对方法都是通过比较序列的单词组成来工作的。这些方法的一个众所周知的问题是相邻单词匹配远非独立。>结果:为了减少相邻单词匹配之间的统计依赖性,我们建议使用由“ “匹配”和“无关”位置,用于无比对序列比较。我们使用递归哈希和位操作描述了这种方法的快速实现,并且我们表明可以通过使用多个模式而不是单个模式来实现进一步的改进。为了评估我们的方法,我们使用间隔词频率作为快速系统发育重建的基础。通过使用真实世界和模拟的序列数据,我们证明了多模式方法比依赖连续单词的方法能产生更好的系统发育。>可用性和实现:我们的程序可在免费获得。 kbd> >联系人: >补充信息:可在在线生物信息学中获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号