首页> 外文期刊>Journal of Parallel and Distributed Computing >Online multimedia retrieval on CPU-GPU platforms with adaptive work partition
【24h】

Online multimedia retrieval on CPU-GPU platforms with adaptive work partition

机译:具有自适应工作分区的CPU-GPU平台上的在线多媒体检索

获取原文
获取原文并翻译 | 示例
           

摘要

Nearest neighbors search is a core operation found in several online multimedia services. These services have to handle very large databases, while, at the same time, they must minimize the query response times observed by users. This is specially complex because those services deal with fluctuating query workloads (rates). Consequently, they must adapt at run-time to minimize the response times as the load varies. In this paper, we address the aforementioned challenges with a distributed memory parallelization of the product quantization nearest neighbor search, also known as IVFADC, for hybrid CPU-GPU machines. Our parallel IVFADC implements an out-of-GPU memory execution scheme to use the GPU for databases in which the index does not fit in its memory, which is crucial for searching in very large databases. The careful use of CPU and GPU with work stealing led to an average response time reduction of 2.4× as compared to using the GPU only. Also, our approach to adapt the system to fluctuating loads, called Dynamic Query Processing Policy (DQPP), attained a response time reduction of up to 5× vs. the best static (BS) policy for moderate loads. The system has attained high query processing rates and near-linear scalability in all experiments. We have evaluated our system on a machine with up to 256 NVIDIA V100 CPUs processing a database of 256 billion SIFT features vectors.
机译:最近的邻居搜索是在几种在线多媒体服务中找到的核心操作。这些服务必须处理非常大的数据库,而同时,它们必须最小化用户观察到的查询响应时间。这是特别复杂的,因为这些服务处理波动的查询工作负载(速率)。因此,当负载变化时,它们必须在运行时调整以最小化响应时间。在本文中,我们通过用于混合CPU-GPU机器的产品量化最近邻南搜索的分布式存储器并行化,以及用于混合CPU-GPU机器的分布式存储器并行化。我们的并行IVFADC实现了一个GPU内存执行方案,以使用GPU进行数据库,其中索引不适合其内存,这对于在非常大的数据库中搜索至关重要。与使用GPU的仔细使用CPU和GPU的CPU和GPU导致平均响应时间减少2.4×。此外,我们的方法使系统调整到波动的波动,称为动态查询处理策略(DQPP),达到了高达5×Vs的响应时间减少了最多5×Vs。适用于适度负载的最佳静态(BS)策略。该系统在所有实验中达到了高查询处理率和近线性可扩展性。我们在高达256个NVIDIA V100 CPU的机器上评估了我们的系统,处理了一个2560亿筛息的数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号