【24h】

Cluster Analysis for Optimal Indexing

机译:聚类分析以获得最佳索引

获取原文
获取原文并翻译 | 示例

摘要

High-dimensional indexing is an important area of current research, especially for range and kNN queries. This work introduces clustering for the sake of indexing. The goal is to develop new clustering methods designed to optimize the data partitioning for an indexing-specific tree structure instead of finding data distribution-based clusters. We focus on iDis-tance, a state-of-the-art high-dimensional indexing method, and take a basic approach to solving this new problem. By utilizing spherical clusters in an unsupervised Expectation Maximization algorithm dependent upon local density and cluster overlap, we create a partitioning of the space providing balanced segmentation for a B~+-tree. We also look at the novel idea of reclustering for a specific indexing method by taking the output of one clustering method and reclustering it for use in an index. The algorithms are then tested and evaluated based on our error metric and iDistance query performance.
机译:高维索引是当前研究的重要领域,尤其是对于范围和kNN查询。为了引入索引,本文引入了聚类。目标是开发新的聚类方法,以优化针对特定索引的树结构的数据分区,而不是查找基于数据分布的聚类。我们将重点放在iDis-tance(一种最新的高维索引方法)上,并采用基本方法来解决此新问题。通过在依赖于局部密度和簇重叠的无监督的期望最大化算法中利用球形簇,我们创建了空间的分区,为B〜+-树提供了平衡的分割。我们还通过获取一种聚类方法的输出并将其重新组合以用于索引中的方法,来研究重新组合特定索引方法的新思想。然后根据我们的错误指标和iDistance查询性能对算法进行测试和评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号