【24h】

Fragmented BWT: An Extended BWT for Full-Text Indexing

机译:碎片化的BWT:用于全文索引的扩展BWT

获取原文

摘要

This paper proposes Fragmented Burrows Wheeler Transform (FBWT), an extension to the well-known BWT structure for full-text indexing and searching. A FBWT consists of a number of BWT fragments each covering only a subset of all the suffixes of the original string. As constructing FBWT does not entail building the BWT of the whole string, it is faster than constructing BWT. On the other hand, searching with FBWT can be more costly than that with BWT, since searching the former requires searching all fragments; its amount of work is O(dp+ occ log~(1+∈) n) as opposed to O(p+ occ log~(1+∈) n) of regular BWT, where p is the length of the query string, n the length of the original text, occ the occurrences of the query string, and d the number of fragments. To compensate the search cost, searching with FBWT can be accelerated with SIMD instructions by searching multiple fragments in parallel. Experiments show that building FBWT is about twice as fast as building BWT via a state of the art algorithm (SA-IS); and that FBWT's search performance compared to BWT's depends on the number of occurrences, ranging from four times slower than BWT (when there are few occurrences), to twice as fast as BWT (when there are many).
机译:本文提出了分段式掘洞惠勒变换(FBWT),它是用于全文索引和搜索的著名BWT结构的扩展。 FBWT由许多BWT片段组成,每个片段仅覆盖原始字符串所有后缀的子集。由于构造FBWT不需要构建整个字符串的BWT,因此它比构造BWT更快。另一方面,使用FBWT进行搜索可能比使用BWT进行搜索更为昂贵,因为搜索前者需要搜索所有片段。它的工作量是O(dp + occ log〜(1 +∈)n),而不是常规BWT的O(p + occ log〜(1 +∈)n),其中p是查询字符串的长度,n原始文本的长度,出现的查询字符串和d的碎片数。为了补偿搜索成本,可以通过并行搜索多个片段来使用SIMD指令来加速FBWT的搜索。实验表明,通过最先进的算法(SA-IS),构建FBWT的速度大约是构建BWT的速度的两倍;并且FBWT与BWT相比的搜索性能取决于出现的次数,范围从比BWT慢四倍(出现的次数很少)到两倍于BWT(出现的次数很多)的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号