首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Indexing high-dimensional data for efficient in-memory similarity search
【24h】

Indexing high-dimensional data for efficient in-memory similarity search

机译:索引高维数据以进行有效的内存相似度搜索

获取原文
获取原文并翻译 | 示例

摘要

In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called /spl Delta/-tree, to speed up the high-dimensional query in main memory environment. The /spl Delta/-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called /spl Delta//sup +/-tree, that globally clusters the data space and then partitions clusters into small regions. The /spl Delta//sup +/-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the /spl Delta//sup +/-tree is superior in most cases.
机译:在主存储器系统中,L2高速缓存通常采用32-128字节的高速缓存行大小。与高维度数据(例如> 32D)相比,这些值相对较小。结果是,最小化缓存未命中的现有技术(针对低维数据)不再有效。我们提出了一种新颖的索引结构,称为/ spl Delta / -tree,以加快主内存环境中的高维查询。 / spl Delta /-树是一个多级结构,其中每个级别代表不同维数的数据空间:维数向叶级别增加。其余尺寸使用主成分分析获得。树的每个级别用于更有效地修剪搜索空间,因为较低的维度可以减少距离计算并更好地利用较小的高速缓存行大小。另外,自顶向下的聚类方案可以捕获数据集的特征,因此减少了搜索空间。我们还提出了一个扩展,称为/ spl Delta // sup +/- tree,该扩展可以全局地对数据空间进行聚类,然后将聚类划分为较小的区域。 / spl Delta // sup +/-树可以进一步减少计算成本和缓存未命中。我们进行了广泛的实验,以评估针对不同类型数据集的现有技术所提出的结构。我们的结果表明,在大多数情况下,/ spl Delta // sup +/-树更为出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号