This work focuses on declustering data to improve query performance, since the I/O becomes a bottleneck in databases and information retrieval systems with huge amounts of data. The bottleneck is not new, but it is becoming more and more apparent. Therefore, we investigate techniques that can be used for such declustering, that is, for distributing the data on different disks depending on the probability of their being retrieved together in the same query. The architecture assumed is that of a single processor, with multiple disks to store the data, from which the data can be accessed in parallel. We also investigate access structures that can be used to store data in such a way that the boolean queries are optimized. The declustering techniques that we propose give better performance than the traditionally used techniques like random or round robin. We propose several techniques, viz., T-proximity and KT-proximity, which are suitable for temporal databases, and set intersection-based, multiset intersection-based, vector, euclidean, as well as a proximity technique for information retrieval systems. The access structures that we propose for optimizing boolean queries give a response time that is orders of magnitude lower than the traditional way of treating a boolean query as multiple queries of each of its literals, and then merging the results obtained for those queries to obtain the final result.
展开▼