首页> 外文会议>IEEE International Conference on Big Data >K-nearest Neighbor Search by Random Projection Forests
【24h】

K-nearest Neighbor Search by Random Projection Forests

机译:K-最近邻居按随机投影林进行搜索

获取原文

摘要

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.
机译:K-Collect邻居(KNN)搜索在许多领域具有广泛的应用,包括数据挖掘,机器学习,统计和许多应用领域。灵感来自于集合方法的成功和基于树的水树的灵活性,我们提出了随机投影森林,rpforest,用于knn搜索。 rpforests通过通过一系列仔细选择的随机投影递归地通过递归地构造的随机投影树的集合来聚集kNN。 RPForests在快速衰减的扫视速度和KNN距离中的差异方面取得了显着的准确性。 rpforests具有非常低的计算复杂性。 rpForests的合并性质使其在多核或聚集电脑上轻松运行;运行时间预计几乎与核心或机器的数量成反比。通过显示当集合尺寸增加时,通过显示相邻点将通过集合随机投影树分开的概率的指数衰减来提供理论探讨。我们的理论可用于优化树木生长中随机投影的选择,实验表明效果显着。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号