Data-intensive applications need to address the problem of how to properly place the set of data items to distributed storage nodes. Traditional techniques use the hashing method to achieve the load balance among nodes such as those in Hadoop and Cassandra, but they do not work efficiently for the requests reading multiple data items in one transaction, especially when the source locations of requests are also distributed. Recent works proposed the managed data placement schemes for online social networks, but have a limited scope of applications due to their focuses. We propose an associated data placement (ADP) scheme, which improves the co-location of associated data and the localized data serving while ensuring the balance between nodes. In ADP, we employ the hypergraph partitioning technique to efficiently partition the set of data items and place them to the distributed nodes, and we also take replicas and incremental adjustment into considerations. Through extensive experiments with both synthesized and trace-based datasets, we evaluate the performance of ADP and demonstrate its effectiveness.
展开▼