This thesis presents the design and evaluation of DIMO, a distributed system for matching multimedia objects. DIMO provides multimedia applications with the function of finding the nearest neighbors on large-scale datasets. It also allows multimedia applications to define application-specific functions to further process the computed nearest neighbors. DIMO presents novel methods for partitioning, searching, and storing high-dimensional datasets on distributed infrastructures that support the MapReduce programming model. We implemented DIMO and extensively evaluated it on Amazon clusters with up to 128 machines. We experimented with large datasets of sizes up to 160 million data points extracted from images. Our results show that DIMO produces high precision when compared against the ground-truth nearest neighbors and it can elastically utilize varying amounts of computing resources. Additionally, DIMO outperforms the closest system in the literature by a large margin (up to 20%) in terms of the achieved average precision, and requires less storage.
展开▼