Architecture that facilitates the restoration of a cluster database in a scalable way using backups (e.g., SQL database backups) and a partition rebuild mechanism to achieve a high level of partition level data consistency, even when restore fails on individual machines and/or machine failure occurs. The architecture restores replicas of the partitions in consideration that the backups may be created at different points and at different times. Optimized parallelism is achieved in restoring each database machine using local backups, which eliminates cross-machine network traffic. Thus, fast recovery of the distributed database can be accomplished on the order of hours over thousands of machines and terabytes of data.
展开▼