首页> 中文期刊> 《计算机科学与探索》 >基于联合聚类的超立方体高维索引

基于联合聚类的超立方体高维索引

         

摘要

高维数据集合的最近邻查询性能会受到“维数灾难”(curse of dimensionality)现象的影响.提出了一种基于联合聚类的HC2(hypercube on co-clustering)高维索引结构.首先通过联合聚类算法同时降低数据尺寸和维数,将高维数据集合聚成若干较低维数的类,然后采用超立方体结构对每个类进行空间区域描述.在基于“过滤-精炼”(filter and refine)的查询过程中,计算查询点与各个类之间的距离下界,实现对聚类的有效过滤.为了提高距离下界对真实距离的逼近能力,采用了一种基于统计优化的超立方体区域描述方法SOHC2(statistically optimized hypercube on co-clustering),能够更加有效地缩小搜索空间,提高查询性能.理论分析和实验结果都表明,SOHC2的查询性能明显优于其他索引方法,适合大规模高维数据的查询;与同类索引结构相比,查询速度能够提高3倍以上.%The performance of nearest neighbor search in high-dimensional dataset will succumb to the well-known "curse of dimensionality". This paper proposes a novel hypercube on co-clustering (HC2) index for high-dimensional query. By using the co-clustering methods, both size and dimensionality of the original dataset can be reduced simultaneously, and some low-dimensional clusters can be obtained. Each cluster is described by a bounded hypercube, and lower bounds of the actual distances between the query point and clusters can be efficiently established to achieve fast and lossless similarity search with the filter-and-refine approach. To achieve a tighter lower bounds, the paper investigates a statistically optimal description of hypercube, SOHC2 (statistically optimized hypercube on co-clustering), which generates the least number of candidates for actual distance computations in the sense of statistics. Experimental results show that SOHC2 is up to 3 times faster than the other index structures based on co-clustering, and it also offers significant performance advantages over other existing methods.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号