首页> 外文OA文献 >Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch

【2h】

Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch

机译：分布式协作搜索引擎中的分片选择Elasticsearch中分片选择的设计，实现和评估

页面导航

摘要
著录项
相似文献
相关主题

摘要

To increase their scalability and reliability many search engines today are distributedsystems. In a distributed search engine several nodes collaborate in handling the searchoperations. Usually each node is only responsible for one or a few parts of the indexused for storing and searching. These smaller index parts are usually referred to asshards.Lately ElasticSearch has emerged as a popular distributed search engine intended formedium- and large scale searching. An ElasticSearch cluster could potentially consist ofa lot of nodes and shards. Sending a search query to all nodes and shards might resultin high latency when the size of the cluster is large or when the nodes are far apartfrom each other. ElasticSearch provides some features for limiting the number of nodeswhich participate in each search query in special cases, but generally each query will beprocessed by all nodes and shards.Shard selection is a method used to only forward queries to the shards which are estimatedto be highly relevant to a query. In this thesis a shard selection plugin calledSAFE has been developed for ElasticSearch. SAFE implements four state of the artshard selection algorithms and supports all current query types in ElasticSearch. Thepurpose of SAFE is to further increase the scalability of ElasticSearch by limiting thenumber of nodes which participate in each search query. The cost of using the plugin isthat there might be a negative e ect on the search results.The purpose of this thesis has been to evaluate to which extent SAFE a ects the searchresults in ElasticSearch. The four implemented algorithms have been compared in threedi erent experiments using two di erent data sets. Two new metrics called Pk@N andModi ed Recall have been developed for this thesis which measures the relative performancebetween exhaustive search and shard selection in a search engine like Elastic-Search.The results indicate that three algorithms in SAFE perform very well when documentsare distributed to shards depending on which linguistic topic they belong to. However ifdocuments are randomly allocated to shards, which is the standard approach in Elastic-Search, then SAFE does not show any signi cant results and seems to be unusable.This thesis shows that if a suitable document distribution policy is used and there is atolerance for losing some relevant documents in the search results then a shard selectionimplementations like SAFE could be used to further increase the scalability of adistributed search engine, especially in a low resource environment.

机译：为了提高其可伸缩性和可靠性，当今许多搜索引擎都是分布式系统。在分布式搜索引擎中，几个节点协作来处理搜索操作。通常，每个节点仅负责索引中用于存储和搜索的一个或几个部分。这些较小的索引部分通常称为“碎片”。后来，ElasticSearch成为一种流行的分布式搜索引擎，旨在进行中型和大型搜索。 ElasticSearch集群可能包含许多节点和分片。当群集的大小很大或节点彼此分开时，向所有节点和分片发送搜索查询可能会导致高延迟。 ElasticSearch提供了一些功能来限制在特殊情况下每个搜索查询中参与的节点数量，但是通常每个查询将由所有节点和分片处理。分片选择是一种用于仅将查询转发到估计高度相关的分片的方法查询。在本文中，为ElasticSearch开发了名为SAFE的分片选择插件。 SAFE实现了四种最先进的选择算法，并支持ElasticSearch中的所有当前查询类型。 SAFE的目的是通过限制参与每个搜索查询的节点数来进一步提高ElasticSearch的可伸缩性。使用该插件的代价是可能对搜索结果产生负面影响。本文的目的是评估SAFE在何种程度上影响ElasticSearch的搜索结果。使用两个不同的数据集，在三个不同的实验中对四种已实现的算法进行了比较。为此，本文开发了两个新的度量标准Pk @ N和Modi ed Recall，它们度量了像Elastic-Search这样的搜索引擎中穷举搜索与分片选择之间的相对性能。结果表明，在SAFE中，当文档分发到分片取决于它们属于哪个语言主题。但是，如果将文档随机分配给分片（这是Elastic-Search的标准方法），则SAFE不会显示任何明显的结果，并且似乎是不可用的。本文表明，如果使用了合适的文档分发策略并且对如果在搜索结果中丢失了一些相关文档，则可以使用像SAFE这样的分片选择实现来进一步提高分布式搜索引擎的可伸缩性，尤其是在资源不足的环境中。

著录项

作者
Berglund Per;
展开▼
作者单位

展开▼
年度 2014
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Performance Analysis of Distributed Processing System using Shard Selection Techniques on Elasticsearch [J] . Praveen M Dhulavvagol, Vijayakumar H Bhajantri, S G Totad Procedia Computer Science . 2020,第5期

机译：用碎片选择技术对弹性研究的分布式处理系统性能分析
2. Resource-Efficient Index Shard Replication in Large Scale Search Engines [J] . Li Yusen, Tang Xueyan, Cai Wentong, IEEE Transactions on Parallel and Distributed Systems . 2019,第12期

机译：大型搜索引擎中的资源高效索引分片复制
3. Shard hosts Institution of Fire Engineers Conference [J] . Fire . 2015,第1380期

机译：Shard主持了消防工程师学会会议
4. Troubles and joys: the evaluation, selection and implementation of an enterprise search engine for a large online information service [C] . Lars Klasen Online Information Conference; 20051129-1201; London(GB) . 2005

机译：麻烦与乐趣：评估，选择和实施大型在线信息服务的企业搜索引擎
5. Methodology for the systematic selection, design, and implementation of sustainable distributed household energy infrastructure in developing countries [D] . Henriques, Justin Joseph 2011

机译：在发展中国家系统选择，设计和实施可持续分布式家庭能源基础设施的方法
6. Bubble effect: including internet search engines in systematic reviews introduces selection bias and impedes scientific reproducibility [O] . Marko Ćurković, Andro Košec 2018

机译：泡沫效应：将互联网搜索引擎纳入系统评价中会引入选择偏见并阻碍科学可重复性
7. Replay Attacks and Defenses Against Cross-shard Consensus in Sharded Distributed Ledgers [O] . Alberto Sonnino, Shehar Bano, Mustafa Al-Bassam, 2020

机译：重播攻击和防御分布式分布式分配区分布式分布式分区的交叉分片共识

Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch

摘要

著录项

相似文献

相关主题

期刊订阅