首页> 外文会议>IEEE International Conference on Big Data >Linear-complexity relaxed word Mover's distance with GPU acceleration
【24h】

Linear-complexity relaxed word Mover's distance with GPU acceleration

机译:具有GPU加速功能的线性复杂度轻松的单词移动器距离

获取原文

摘要

The amount of unstructured text-based data is growing every day. Querying, clustering, and classifying this big data requires similarity computations across large sets of documents. Whereas low-complexity similarity metrics are available, attention has been shifting towards more complex methods that achieve a higher accuracy. In particular, the Word Mover's Distance (WMD) method proposed by Kusner et al. is a promising new approach, but its time complexity grows cubically with the number of unique words in the documents. The Relaxed Word Mover's Distance (RWMD) method, again proposed by Kusner et al., reduces the time complexity from qubic to quadratic and results in a limited loss in accuracy compared with WMD. Our work contributes a low-complexity implementation of the RWMD that reduces the average time complexity to linear when operating on large sets of documents. Our linear-complexity RWMD implementation, henceforth referred to as LC-RWMD, maps well onto GPUs and can be efficiently distributed across a cluster of GPUs. Our experiments on real-life datasets demonstrate 1) a performance improvement of two orders of magnitude with respect to our GPU-based distributed implementation of the quadratic RWMD, and 2) a performance improvement of three to four orders of magnitude with respect to our distributed WMD implementation that uses GPU-based RWMD for pruning.
机译:每天基于文本的非结构化数据量都在增长。查询,聚类和分类此大数据需要跨大型文档集进行相似度计算。尽管可以使用低复杂度的相似性度量标准,但人们的注意力已转向可实现更高准确度的更复杂的方法。特别地,Kusner等人提出的词移动距离(WMD)方法。是一种很有前途的新方法,但是其时间复杂度会随着文档中唯一词的数量而增加。再次由Kusner等人提出的放宽单词移动器距离(RWMD)方法将时间复杂度从qubic减少到二次,并且与WMD相比,导致精度损失有限。我们的工作为RWMD的低复杂度实现做出了贡献,从而在处理大量文档时将平均时间复杂度降低为线性。我们的线性复杂性RWMD实现(以下称为LC-RWMD)可以很好地映射到GPU上,并且可以有效地分布在GPU集群中。我们在现实数据集上的实验表明:1)相对于基于GPU的二次RWMD分布式实现,性能提高了两个数量级; 2)相对于我们的分布式,性能提高了三至四个数量级使用基于GPU的RWMD进行修剪的WMD实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号