...
首页> 外文期刊>Future generation computer systems >Temporal representation for mining scientific data provenance
【24h】

Temporal representation for mining scientific data provenance

机译:挖掘科学数据源的时间表示

获取原文
获取原文并翻译 | 示例
           

摘要

Provenance of digital scientific data is a distinct piece of metadata about a data object. It can serve as a "ground-truth" for determining the cause of execution failure for instance, or can explain a particular result to a researcher intending to reuse a data object. Provenance can quickly grow voluminous and be quite feature rich, requiring new structure and concepts that support data mining. We propose a representation of data provenance using logical time that reduces the feature space of the provenance. The temporal representation supports clustering, classification and association rule mining. This paper studies the full utility of the temporal representation through an empirical evaluation and identification of the data mining algorithms that are most effective in application to the proposed representation. The evaluation is carried out against a multi-gigabyte semi-synthetic provenance dataset built from a range of scientific workflows, and against a real one month provenance dataset gathered from a satellite instrument. Through analysis of the results via clustering metrics-purity and Normalized Mutual Information (NMI), we determine that the k-means algorithm gives the best clustering with the proposed temporal representation, while still yielding provenance-useful information.
机译:数字科学数据的来源是有关数据对象的不同元数据片段。例如,它可以用作确定执行失败原因的“真相”,也可以向打算重用数据对象的研究人员解释特定的结果。来源迅速增长,功能丰富,需要支持数据挖掘的新结构和新概念。我们建议使用逻辑时间来表示数据来源,以减少来源的特征空间。时间表示支持聚类,分类和关联规则挖掘。本文通过对数据挖掘算法的经验评估和识别来研究时间表示的全部效用,这些数据挖掘算法最有效地应用于提出的表示。该评估是针对通过一系列科学工作流程构建的数千兆字节的半合成来源数据集,以及针对从卫星仪器收集的真实一个月的来源数据集而进行的。通过使用聚类度量-纯度和归一化互信息(NMI)对结果进行分析,我们确定k-means算法与拟议的时间表示形式可提供最佳聚类,同时仍可提供出处有用的信息。

著录项

  • 来源
    《Future generation computer systems》 |2014年第7期|363-378|共16页
  • 作者单位

    School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA;

    School of Informatics and Computing, Indiana University Bloomington, Bloomington, IN, USA;

    Computer Engineering Department, Yildiz Technical University, Istanbul, Turkey;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Provenance; Temporal representation; Data mining;

    机译:种源时间代表;数据挖掘;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号