A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

机译：在异构平台上构造反向文件的快速算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely fast. Our approach is tailored for a heterogeneous platform consisting of a multicore CPU and a highly multithreaded GPU. Our algorithm is based on a number of novel techniques including: (i) a high-throughput pipelined strategy that produces parallel parsed streams that are consumed at the same rate by parallel indexers, (ii) a hybrid trie and B-tree dictionary data structure in which the trie is represented by a table for fast look-up and each B-tree node contains string caches, (iii) allocation of parsed streams with frequent terms to CPU threads and the rest to GPU threads so as to match the throughput of parsed streams, and (iv) optimized CUDA indexer implementation that ensures coalesced memory accesses and effective use of shared memory. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core) with two NVIDIA Tesla C1060 attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters.

机译：给定驻留在磁盘上的文档的集合，我们开发了一种新的策略来处理这些文档并极其快速地构建反向文件。我们的方法是为包含多核CPU和高度多线程GPU的异构平台量身定制的。我们的算法基于许多新颖的技术，包括：（i）一种高吞吐量流水线策略，该策略产生并行解析的流，并由并行索引器以相同的速率消耗;（ii）混合trie和B树字典数据结构其中的特里用一个用于快速查找的表表示，并且每个B树节点都包含字符串缓存，（iii）将具有频繁术语的已解析流分配给CPU线程，其余分配给GPU线程，以匹配吞吐量。解析的流，以及（iv）优化的CUDA索引器实现，可确保合并的内存访问和有效使用共享内存。我们已经在连接了两个NVIDIA Tesla C1060的单个节点（两个Intel Xeon X5560四核）上对算法进行了广泛的测试，并在ClueWeb09数据集上实现了超过262 MB / s的吞吐量。对于广泛不同的数据集，获得了相似的结果。即使与在大型集群上运行的算法相比，我们的算法的吞吐量也优于文献中报道的最著名的算法。

著录项

来源
《2011 25th IEEE International Parallel Distributed Processing Symposium》|2011年|p.1124-1134|共11页
会议地点
作者
Wei Zheng; JaJa Joseph;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. A fast algorithm for constructing inverted files on heterogeneous platforms [J] . Zheng Wei, Joseph Jaja Journal of Parallel and Distributed Computing . 2012,第5期

机译：在异构平台上构造反向文件的快速算法
2. A Fast Approximate String Matching Algorithm Using an Inverted File and Bit-arrays [J] . HIDEKI SHIMOMURA, TOSHIKAZU FUKUSHIMA 情報処理学会論文誌 . 1999,第4期

机译：使用反向文件和位数组的快速近似字符串匹配算法
3. An Optimized High-Throughput Strategy for Constructing Inverted Files [J] . Wei Zheng, JaJa Joseph Parallel and Distributed Systems, IEEE Transactions on . 2012,第11期

机译：构造倒排文件的优化高通量策略
4. A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms [C] . Zheng Wei, Joseph JaJa IEEE International Parallel and Distributed Processing Symposium . 2011

机译：一种在异构平台上构建反相文件的快速算法
5. High-performance computing algorithms for constructing inverted files on emerging multicore processors. [D] . Wei, Zheng. 2012

机译：用于在新兴的多核处理器上构造反向文件的高性能计算算法。
6. Efficient Inverted Index Compression Algorithm Characterized by Faster Decompression Compared with the Golomb-Rice Algorithm [O] . Andrzej Chmielowiec, Paweł Litwin 2021

机译：与狼族米算法相比具有更快的减压特征的有效倒置索引压缩算法
7. Figure 10: The pipeline showing how we gradually construct an appropriate .mat file format (MATLAB data file) to input in the competing algorithms and which segment of the pipeline we took into account in calculating their total cpu running time. [O] . -1

机译：图10：管道显示我们如何逐步构建适当的.MAT文件格式（MATLAB数据文件）到竞争算法中的输入以及我们在计算其总CPU运行时间时考虑到哪些管道的段。

A Fast Algorithm for Constructing Inverted Files on Heterogeneous Platforms

摘要

著录项

相似文献

相关主题

期刊订阅