【24h】

An Efficient Index Structure for String Databases

机译:字符串数据库的有效索引结构

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of substring searching in large databases. Typical applications of this problem are genetic data, web data, and event sequences. Since the size of such databases grows exponentially, it becomes impractical to use in-memory algorithms for these problems. In this paper, we propose to map the substrings of the data into an integer space with the help of wavelet coefficients. Later, we index these coefficients using MBRs (Minimum Bounding Rectangles). We define a distance function which is a lower bound to the actual edit distance between strings. We experiment with both nearest neighbor queries and range queries. The results show that our technique prunes significant amount of the database (typically 50-95%), thus reducing both the disk I/O cost and the CPU cost significantly.
机译:我们考虑大型数据库中子字符串搜索的问题。此问题的典型应用是遗传数据,Web数据和事件序列。由于此类数据库的大小呈指数增长,因此针对这些问题使用内存中算法变得不切实际。在本文中,我们建议借助小波系数将数据的子字符串映射到整数空间中。之后,我们使用MBR(最小边界矩形)对这些系数进行索引。我们定义一个距离函数,该距离是字符串之间实际编辑距离的下限。我们尝试了最近邻居查询和范围查询。结果表明,我们的技术修剪了大量的数据库(通常为50-95%),从而显着降低了磁盘I / O成本和CPU成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号