首页> 美国卫生研究院文献>Journal of Computational Biology >Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters
【2h】

Improving Bloom Filter Performance on Sequence Data Using k-mer Bloom Filters

机译:使用k-mer布隆过滤器提高序列数据上的布隆过滤器性能

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Using a sequence's k-mer content rather than the full sequence directly has enabled significant performance improvements in several sequencing applications, such as metagenomic species identification, estimation of transcript abundances, and alignment-free comparison of sequencing data. As k-mer sets often reach hundreds of millions of elements, traditional data structures are often impractical for k-mer set storage, and Bloom filters (BFs) and their variants are used instead. BFs reduce the memory footprint required to store millions of k-mers while allowing for fast set containment queries, at the cost of a low false positive rate (FPR). We show that, because k-mers are derived from sequencing reads, the information about k-mer overlap in the original sequence can be used to reduce the FPR up to 30 × with little or no additional memory and with set containment queries that are only 1.3 – 1.6 times slower. Alternatively, we can leverage k-mer overlap information to store k-mer sets in about half the space while maintaining the original FPR. We consider several variants of such k-mer Bloom filters (kBFs), derive theoretical upper bounds for their FPR, and discuss their range of applications and limitations.
机译:>直接使用序列的k-mer内容而不是完整序列可在多种测序应用中实现显着的性能改善,例如宏基因组物种鉴定,转录本丰度估算和测序数据的无比对比较。由于k-mer集通常达到数亿个元素,因此传统数据结构对于k-mer集存储通常不切实际,因此使用Bloom过滤器(BF)及其变体。 BF减少了存储数百万个k-mers所需的内存占用,同时允许以低的误报率(FPR)为代价进行快速的集合遏制查询。我们证明,由于k-mer是从测序读取中获得的,因此原始序列中有关k-mer重叠的信息可用于将FPR降低到30×,而几乎没有或没有额外的内存,并且仅包含设置包含查询慢1.3 – 1.6倍。或者,我们可以利用k-mer重叠信息将k-mer集存储在大约一半的空间中,同时保持原始FPR。我们考虑了此类k-mer布隆过滤器(kBF)的几种变体,推导了其FPR的理论上限,并讨论了其应用范围和局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号