首页> 外文期刊>IBM Journal of Research and Development >Dataflow representation of data analyses: Toward a platform for collaborative data science
【24h】

Dataflow representation of data analyses: Toward a platform for collaborative data science

机译:数据分析的数据流表示:建立协作数据科学的平台

获取原文
获取原文并翻译 | 示例
           

摘要

Data science plays an increasingly important role in solving today's scientific and social challenges. To promote progress toward a cure for multiple sclerosis, the Accelerated Cure Project has created an open repository of biological and survey data on patients with multiple sclerosis. Similar large-scale repositories are being created in other domains. As the open, data-driven model of science proliferates, the research community faces a growing need for a cloud platform for collaborative data science. Such a platform should facilitate collaboration between domain experts and data scientists and possess artificial intelligence capabilities for organizing, recommending, and manipulating data analyses. In this paper, we present some foundational technologies motivated by this vision. Our system automatically extracts a high-level dataflow graph from a data analysis. This graph describes how data flows through an analysis pipeline, including which statistical methods are used and how they fit together. The system requires no special annotations from the data analyst and consumes analyses written in Python using standard tools, such as Scikit-learn and Statsmodels. In this paper, we explain how our system works and how it fits into our larger vision for a collaborative data science platform.
机译:数据科学在解决当今科学和社会挑战方面发挥着越来越重要的作用。为促进多发性硬化症的治疗进展,加速治愈项目创建了一个开放的生物多变硬化症患者生物学和调查数据存储库。在其他域中正在创建类似的大型存储库。随着开放的,数据驱动的科学模型的泛滥,研究界面临着对协作数据科学云平台的日益增长的需求。这种平台应促进领域专家与数据科学家之间的协作,并具有用于组织,推荐和操作数据分析的人工智能功能。在本文中,我们介绍了受此愿景启发的一些基础技术。我们的系统会自动从数据分析中提取高级数据流图。此图描述了数据如何流经分析管道,包括使用了哪些统计方法以及它们如何组合在一起。该系统不需要数据分析人员的特殊注释,并且可以使用标准工具(例如Scikit-learn和Statsmodels)使用以Python编写的分析。在本文中,我们解释了我们的系统如何工作以及如何适应我们对协作数据科学平台的更大愿景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号