首页> 外文OA文献 >Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch
【2h】

Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch

机译:分布式协作搜索引擎中的分片选择Elasticsearch中分片选择的设计,实现和评估

摘要

To increase their scalability and reliability many search engines today are distributedsystems. In a distributed search engine several nodes collaborate in handling the searchoperations. Usually each node is only responsible for one or a few parts of the indexused for storing and searching. These smaller index parts are usually referred to asshards.Lately ElasticSearch has emerged as a popular distributed search engine intended formedium- and large scale searching. An ElasticSearch cluster could potentially consist ofa lot of nodes and shards. Sending a search query to all nodes and shards might resultin high latency when the size of the cluster is large or when the nodes are far apartfrom each other. ElasticSearch provides some features for limiting the number of nodeswhich participate in each search query in special cases, but generally each query will beprocessed by all nodes and shards.Shard selection is a method used to only forward queries to the shards which are estimatedto be highly relevant to a query. In this thesis a shard selection plugin calledSAFE has been developed for ElasticSearch. SAFE implements four state of the artshard selection algorithms and supports all current query types in ElasticSearch. Thepurpose of SAFE is to further increase the scalability of ElasticSearch by limiting thenumber of nodes which participate in each search query. The cost of using the plugin isthat there might be a negative e ect on the search results.The purpose of this thesis has been to evaluate to which extent SAFE a ects the searchresults in ElasticSearch. The four implemented algorithms have been compared in threedi erent experiments using two di erent data sets. Two new metrics called Pk@N andModi ed Recall have been developed for this thesis which measures the relative performancebetween exhaustive search and shard selection in a search engine like Elastic-Search.The results indicate that three algorithms in SAFE perform very well when documentsare distributed to shards depending on which linguistic topic they belong to. However ifdocuments are randomly allocated to shards, which is the standard approach in Elastic-Search, then SAFE does not show any signi cant results and seems to be unusable.This thesis shows that if a suitable document distribution policy is used and there is atolerance for losing some relevant documents in the search results then a shard selectionimplementations like SAFE could be used to further increase the scalability of adistributed search engine, especially in a low resource environment.
机译:为了提高其可伸缩性和可靠性,当今许多搜索引擎都是分布式系统。在分布式搜索引擎中,几个节点协作来处理搜索操作。通常,每个节点仅负责索引中用于存储和搜索的一个或几个部分。这些较小的索引部分通常称为“碎片”。后来,ElasticSearch成为一种流行的分布式搜索引擎,旨在进行中型和大型搜索。 ElasticSearch集群可能包含许多节点和分片。当群集的大小很大或节点彼此分开时,向所有节点和分片发送搜索查询可能会导致高延迟。 ElasticSearch提供了一些功能来限制在特殊情况下每个搜索查询中参与的节点数量,但是通常每个查询将由所有节点和分片处理。分片选择是一种用于仅将查询转发到估计高度相关的分片的方法查询。在本文中,为ElasticSearch开发了名为SAFE的分片选择插件。 SAFE实现了四种最先进的选择算法,并支持ElasticSearch中的所有当前查询类型。 SAFE的目的是通过限制参与每个搜索查询的节点数来进一步提高ElasticSearch的可伸缩性。使用该插件的代价是可能对搜索结果产生负面影响。本文的目的是评估SAFE在何种程度上影响ElasticSearch的搜索结果。使用两个不同的数据集,在三个不同的实验中对四种已实现的算法进行了比较。为此,本文开发了两个新的度量标准Pk @ N和Modi ed Recall,它们度量了像Elastic-Search这样的搜索引擎中穷举搜索与分片选择之间的相对性能。结果表明,在SAFE中,当文档分发到分片取决于它们属于哪个语言主题。但是,如果将文档随机分配给分片(这是Elastic-Search的标准方法),则SAFE不会显示任何明显的结果,并且似乎是不可用的。本文表明,如果使用了合适的文档分发策略并且对如果在搜索结果中丢失了一些相关文档,则可以使用像SAFE这样的分片选择实现来进一步提高分布式搜索引擎的可伸缩性,尤其是在资源不足的环境中。

著录项

  • 作者

    Berglund Per;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号