This paper considers a multi-query optimization issue for distributed similarity query processing, which attempts to exploit the dependencies in the derivation of a query evaluation plan. To the best of our knowledge, this is the first work investigating a multi-query optimization technique for distributed similarity query processing (MDSQ). Four steps are incorporated in our MDSQ algorithm. First when a number of query requests (i.e., m query vectors and m radiuses) are simultaneously submitted by users, then a cost-based dynamic query scheduling (DQS) procedure is invoked to quickly and effectively identify the correlation among the query spheres (requests). After that, an index-based vector set reduction is performed at data node level in parallel. Finally, a refinement process of the candidate vectors is conducted to get the answer set. The proposed method includes a cost-based dynamic query scheduling, a Start-Distance (SD)-based load balancing scheme, and an index-based vector set reduction algorithm. The experimental results validate the efficiency and effectiveness of the algorithm in minimizing the response time and increasing the parallelism of I/O and CPU.
展开▼