• English
    • svenska
  • English 
    • English
    • svenska
  • Login
View Item 
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
  •   Home
  • Student essays / Studentuppsatser
  • Department of Computer Science and Engineering / Institutionen för data- och informationsteknik
  • Masteruppsatser
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Shard Selection in Distributed Collaborative Search Engines A design, implementation and evaluation of shard selection in ElasticSearch

Abstract
To increase their scalability and reliability many search engines today are distributed systems. In a distributed search engine several nodes collaborate in handling the search operations. Usually each node is only responsible for one or a few parts of the index used for storing and searching. These smaller index parts are usually referred to as shards. Lately ElasticSearch has emerged as a popular distributed search engine intended for medium- and large scale searching. An ElasticSearch cluster could potentially consist of a lot of nodes and shards. Sending a search query to all nodes and shards might result in high latency when the size of the cluster is large or when the nodes are far apart from each other. ElasticSearch provides some features for limiting the number of nodes which participate in each search query in special cases, but generally each query will be processed by all nodes and shards. Shard selection is a method used to only forward queries to the shards which are estimated to be highly relevant to a query. In this thesis a shard selection plugin called SAFE has been developed for ElasticSearch. SAFE implements four state of the art shard selection algorithms and supports all current query types in ElasticSearch. The purpose of SAFE is to further increase the scalability of ElasticSearch by limiting the number of nodes which participate in each search query. The cost of using the plugin is that there might be a negative e ect on the search results. The purpose of this thesis has been to evaluate to which extent SAFE a ects the search results in ElasticSearch. The four implemented algorithms have been compared in three di erent experiments using two di erent data sets. Two new metrics called Pk@N and Modi ed Recall have been developed for this thesis which measures the relative performance between exhaustive search and shard selection in a search engine like Elastic- Search. The results indicate that three algorithms in SAFE perform very well when documents are distributed to shards depending on which linguistic topic they belong to. However if documents are randomly allocated to shards, which is the standard approach in Elastic- Search, then SAFE does not show any signi cant results and seems to be unusable. This thesis shows that if a suitable document distribution policy is used and there is a tolerance for losing some relevant documents in the search results then a shard selection implementations like SAFE could be used to further increase the scalability of a distributed search engine, especially in a low resource environment.
Degree
Student essay
URI
http://hdl.handle.net/2077/36976
Collections
  • Masteruppsatser
View/Open
gupea_2077_36976_1.pdf (1.148Mb)
Date
2014-09-19
Author
Berglund, Per
Language
eng
Metadata
Show full item record

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV
 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister

DSpace software copyright © 2002-2016  DuraSpace
Contact Us | Send Feedback
Theme by 
Atmire NV