In almost all research field scientific studies can be implemented by in silico experiments. They are modelled by scientific workflows which describes the data or control flow between the consecutive computational tasks. Since these experiments are data and compute intensive they need parallel and distributed infrastructures to be enacted (grids, clusters, clouds and supercomputers). The complexity of the infrastructures and the continuously changing environment faces us a big challenge in reproducibility, which is often needed for results sharing or for judging scientific claims in the scientists' community. The necessary parameters of reproducible workflows can be originated from different sources (infrastructural, third party, or related to the binaries), which may change or become unavailable during the process of re-execution. However in most cases the lack of the original parameters can be compensated by replacing, evaluating or simulating the value of the descriptors with some extra cost in order to make it reproducible. In this paper we give the expected cost of making a workflow reproducible or more precisely to determine the probability of making a workflow reproducible with more than a predefined cost C.
展开▼