Nowadays, more and more scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is an appropriate tool for modeling such process. Since the execution of data-intensive scientific workflows requires large-scale computing and storage resources, a cloud environment, which provides virtually infinite resources is appealing. However, because of the general geographical distribution of scientific groups collaborating in the experiments, multisite management of data-intensive scientific workflows in the cloud is becoming an important problem. This paper presents a general study of the current state of the art of data-intensive scientific workflow execution in the cloud and corresponding multisite management techniques.
展开▼