Supporting queries over dispersed data stored in large-scale distributed systems, such as peer-to-peer networks, naturally calls for ranked retrieval in order to effectively focus on the most relevant (i.e., top-k) results. While top-k retrieval has been actively studied lately, existing algorithms are too restrictive due to their assumptions about how the data is partitioned amongst the various data sources. Unlike existing approaches that assume a single type of data partitioning, we generalize the application scenario to include peer-to-peer networks of a potentially large number of peers in which the data might be partitioned in various ways. More specifically, we develop a novel unified top-k query processing framework supporting various types of data partitioning. In order to support top-k queries in our unified framework, we have developed very efficient wavelet-based data synopses and algorithms that approximate the top-k results with most operations occurring in the wavelet coefficient domain. Our simulation and experimental results show that our framework yields low bandwidth consumption, high accuracy, and low latency for top-k retrieval in peer-to-peer networks.
展开▼