【24h】

Random Hyperplane Projection using Derived Dimensions*

机译:使用派生尺寸的随机超平面投影*

获取原文
获取原文并翻译 | 示例

摘要

Computing the similarity between data objects is a fundamental operation for many distributive applications such as those on the Word Wide Wed, in Peer-to-Peer networks or even in Sensor Networks. Locality Sensitive Hashing (LSH) has been recently proposed in order to reduce the number of bits that need to be transmitted between sites in order to permit evaluation of different similarity functions between the data objects. In our work we investigate a particular form of LSH, termed Random Hyperplane Projection (RHP). RHP is a data agnostic model that works for arbitrary data sets. However, data in most applications is not uniform. In our work, we first describe the shortcomings of the RHP scheme, in particular, its inefficiency to exploit evident skew in the underlying data distribution and then propose a novel framework that automatically detects correlations and computes an RHP embedding in the Hamming cube tailored to the provided data set. We further discuss extensions of our framework in order to cope with changes in the data distribution or outliers. In such cases our technique automatically reverts to the basic RHP model for data items that can not be described accurately through the computed embedding. Our experimental evaluation using several real datasets demonstrates that our proposed scheme outperforms the existing RHP algorithm providing up to three times more accurate similarity computations using the same number of bits.
机译:计算数据对象之间的相似性是许多分布式应用程序(如Word Wide Wed,点对点网络甚至传感器网络中的那些应用程序)的基本操作。最近提出了局部敏感散列(LSH),以便减少在站点之间传输的比特数,以便允许评估数据对象之间的不同相似性函数。在我们的工作中,我们研究了LSH的一种特殊形式,称为随机超平面投影(RHP)。 RHP是适用于任意数据集的数据不可知模型。但是,大多数应用程序中的数据并不统一。在我们的工作中,我们首先描述了RHP方案的缺点,特别是它在利用基础数据分布中明显的偏斜方面效率低下,然后提出了一个新颖的框架,该框架可自动检测相关性并计算针对Rham定制的嵌入在Hamming多维数据集中的RHP嵌入提供的数据集。我们将进一步讨论框架的扩展,以应对数据分布或异常值的变化。在这种情况下,对于无法通过计算的嵌入准确描述的数据项,我们的技术会自动恢复为基本RHP模型。我们使用多个真实数据集进行的实验评估表明,我们提出的方案优于现有的RHP算法,使用相同的位数可以提供多达三倍的准确度相似度计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号