首页> 外文会议>IEEE International Conference on Big Data >K-nearest Neighbor Search by Random Projection Forests
【24h】

K-nearest Neighbor Search by Random Projection Forests

机译:通过随机投影森林进行的K近邻搜索

获取原文

摘要

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.
机译:K近邻(kNN)搜索在许多领域都有广泛的应用,包括数据挖掘,机器学习,统计和许多应用领域。受集成方法的成功和基于树的方法的灵活性的启发,我们提出了用于kNN搜索的随机投影森林rpForests。 rpForests通过汇总一组随机投影树的结果来找到kNN,每个树都是通过一系列精心选择的随机投影递归构造的。 rpForests在kNN丢失率的快速衰减和kNN距离的差异方面实现了卓越的准确性。 rpForests的计算复杂度很低。 rpForests的集成特性使它可以轻松地在多核或集群计算机上并行运行。预计运行时间几乎与内核或机器的数量成反比。通过显示当合奏大小增加时相邻点将被合奏随机投影树分隔的概率的指数衰减,我们给出了理论上的见解。我们的理论可以用来完善树木生长过程中随机投影的选择,实验表明该效果是显着的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号