首页> 外文会议>Data Compression Conference >Indexing Sequences of IEEE 754 Double Precision Numbers
【24h】

Indexing Sequences of IEEE 754 Double Precision Numbers

机译:IEEE 754双精度数字的索引序列

获取原文

摘要

In the last decades, much attention has been paid to the development of succinct data structures to store and/or index text, biological collections, source code, etc. Their success was in most cases due to handling data with a relatively small alphabet size and to typically exploit a rather skewed distribution (text) or simply the repetitiveness within the source data (source code repositories, biological sequences of similar individuals). In this work, we face the problem of dealing with collections of floating point data that typically have a large alphabet (a real number hardly ever repeats twice) and a less biased distribution. We present two solutions to store and index such collections. The first one is based on the well-known inverted index. It consumes space around the size of the original collection, providing appealing search times. The second one uses a wavelet tree, which at the expense of slower search times, obtains slightly better space consumption.
机译:在过去的几十年中,已经向制造和/或索引文本,生物收集,源代码等进行了很多关注,以便存储和/或索引文本,生物收集,源代码等。它们的成功在大多数情况下,由于处理具有相对较小的字母大小和 通常利用相当倾斜的分发(文本)或只是源数据内的重复性(源代码存储库,类似个体的生物序列)。 在这项工作中,我们面临处理通常具有大字母(实数几乎没有重复两次)的浮点数据集合的问题,并且偏置分布较少。 我们提出了两个解决方案来存储和索引此类集合。 第一个基于众所周知的倒指数。 它消耗了原始集合大小的空间,提供了吸引人的搜索时代。 第二个使用小波树,其以较慢的搜索时间为代价,获得稍微更好的空间消耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号