首页> 外文会议>2011 25th IEEE International Parallel Distributed Processing Symposium >A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms
【24h】

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

机译:在异构平台上构造反向文件的快速算法

获取原文

摘要

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers, (ii) a hybrid trie and B-tree dictionary data structure in which the trie is represented by a table for fast look-up and each B-tree node contains string caches, (iii) allocation of parsed streams with frequent terms to CPU threads and the rest to GPU threads so as to match the throughput of parsed streams, and (iv) optimized CUDA indexer implementation that ensures coalesced memory accesses and effective use of shared memory. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core) with two NVIDIA Tesla C1060 attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.
机译:给定驻留在磁盘上的文档的集合,我们开发了一种新的策略来处理这些文档并极其快速地构建反向文件。我们的方法是为包含多核CPU和高度多线程GPU的异构平台量身定制的。我们的算法基于许多新颖的技术,包括:(i)一种高吞吐量流水线策略,该策略产生并行解析的流,并由并行索引器以相同的速率消耗;(ii)混合trie和B树字典数据结构其中的特里用一个用于快速查找的表表示,并且每个B树节点都包含字符串缓存,(iii)将具有频繁术语的已解析流分配给CPU线程,其余分配给GPU线程,以匹配吞吐量。解析的流,以及(iv)优化的CUDA索引器实现,可确保合并的内存访问和有效使用共享内存。我们已经在连接了两个NVIDIA Tesla C1060的单个节点(两个Intel Xeon X5560四核)上对算法进行了广泛的测试,并在ClueWeb09数据集上实现了超过262 MB / s的吞吐量。对于广泛不同的数据集,获得了相似的结果。即使与在大型集群上运行的算法相比,我们的算法的吞吐量也优于文献中报道的最著名的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号