A Term-Based Inverted Index Partitioning Model for Efficient Distributed Query Processing

B. BARLA CAMBAZOGLU; ENVER KAYAASLAN; SIMON JONASSEN; CEVDET AYKANAT

首页> 外文期刊>ACM transactions on the web >A Term-Based Inverted Index Partitioning Model for Efficient Distributed Query Processing

【24h】

A Term-Based Inverted Index Partitioning Model for Efficient Distributed Query Processing

机译：基于术语的反向索引分区模型，用于高效的分布式查询处理

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In a shared-nothing, distributed text retrieval system, queries are processed over an inverted index that is partitioned among a number of index servers. In practice, the index is either document-based or term-based partitioned. This choice is made depending on the properties of the underlying hardware infrastructure, query traffic distribution, and some performance and availability constraints. In query processing on retrieval systems that adopt a term-based index partitioning strategy, the high communication overhead due to the transfer of large amounts of data from the index servers forms a major performance bottleneck, deteriorating the scalability of the entire distributed retrieval system. In this work, to alleviate this problem, we propose a novel inverted index partitioning model that relies on hypergraph partitioning. In the proposed model, concurrently accessed index entries are assigned to the same index servers, based on the inverted index access patterns extracted from the past query logs. The model aims to minimize the communication overhead that will be incurred by future queries while maintaining the computational load balance among the index servers. We evaluate the performance of the proposed model through extensive experiments using a real-life text collection and a search query sample. Our results show that considerable performance gains can be achieved relative to the term-based index partitioning strategies previously proposed in literature. In most cases, however, the performance remains inferior to that attained by document-based partitioning.

机译：在不共享内容的分布式文本检索系统中，查询是通过在多个索引服务器之间分区的反向索引进行处理的。实际上，索引是基于文档的分区或基于术语的分区。根据基础硬件基础结构的属性，查询流量分配以及一些性能和可用性约束来做出选择。在采用基于术语的索引分区策略的检索系统上的查询处理中，由于从索引服务器传输大量数据而导致的高通信开销形成了主要的性能瓶颈，从而降低了整个分布式检索系统的可伸缩性。在这项工作中，为了缓解此问题，我们提出了一种依赖超图分区的新颖的倒排索引分区模型。在提出的模型中，基于从过去查询日志中提取的反向索引访问模式，将同时访问的索引条目分配给相同的索引服务器。该模型旨在最大程度地减少将来查询产生的通信开销，同时保持索引服务器之间的计算负载平衡。我们通过使用真实文本集和搜索查询示例的大量实验来评估所提出模型的性能。我们的结果表明，相对于先前在文献中提出的基于术语的索引分区策略，可以实现可观的性能提升。但是，在大多数情况下，性能仍然不及基于文档的分区所达到的性能。

著录项

来源
《ACM transactions on the web》 |2013年第3期|15.1-15.23|共23页
作者
B. BARLA CAMBAZOGLU; ENVER KAYAASLAN; SIMON JONASSEN; CEVDET AYKANAT;
展开▼
作者单位

Yahoo! Research;

Yahoo! Research Barcelona,Computer Science Department, Bilkent University;

Yahoo! Research Barcelona,Department of Computer and Information Science, Norwegian University of Science and Technology;

Computer Science Department, Bilkent University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Web search engine; term-based index partitioning; distributed query processing; hypergraph partitioning;

机译：网络搜索引擎;基于术语的索引分区;分布式查询处理;超图分割;

相似文献

外文文献
中文文献
专利

1. Efficient spatial query processing for KNN queries using well organised net-grid partition indexing approach [J] . K. Geetha, A. Kannan International journal of data mining, modelling and management . 2018,第4期

机译：使用组织良好的网络网格分区索引方法对KNN查询进行有效的空间查询处理
2. Cache-based Aggregate Query Shipping: An Efficient Scheme Of Distributed Olap Query Processing [J] . Hua, Ming Liao, Guo, Journal of Computer Science & Technology . 2008,第6期

机译：基于缓存的聚合查询传送：分布式Olap查询处理的有效方案
3. Cache-Based Aggregate Query Shipping: An Efficient Scheme of Distributed OLAP Query Processing [J] . Hua-Ming Liao, Guo-Shun Pei 计算机科学技术学报（英文版） . 2008,第006期

机译：基于缓存的聚合查询传送：分布式OLAP查询处理的有效方案
4. Distributed query processing using partitioned inverted files [C] . Badue C., Ribeiro-Neto B., Baeza-Yates R., String Processing and Information Retrieval, 2001. SPIRE 2001. Proceedings.Eighth International Symposium on . 2001

机译：使用分区反向文件的分布式查询处理
5. PPDQ-BG: Parallel partition and distributed query processing for big graphs [D] . Kandula, Lema 2016

机译：PPDQ-BG：大图的并行分区和分布式查询处理
6. Efficient Continuous Skyline Query Processing in Wireless Sensor Networks [O] . Yingyuan Xiao, Xu Jiao, Hongya Wang, 2019

机译：无线传感器网络中高效的连续天际线查询处理
7. A term-based inverted index partitioning model for efficient distributed query processing [O] . Cambazoglu B.B., Kayaaslan E., Jonassen S., Aykanat C. 2013

机译：基于术语的反向索引分区模型，用于高效的分布式查询处理
8. Methodolgy, Based on Analytical Modeling, for the Design of Parallel and Distributed Architectures for Relational Database Query Processors [R] . Kearns, T. G. 1987

机译：methodolgy，基于分析建模，用于关系数据库查询处理器的并行和分布式架构设计

A Term-Based Inverted Index Partitioning Model for Efficient Distributed Query Processing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅