...
首页> 外文期刊>Empirical Software Engineering >World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data
【24h】

World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data

机译:代码世界:启用开采和分析开源VCS数据宇宙的研究工作流程

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are the tens of millions of projects in the periphery interconnected through technical dependencies, code sharing, or knowledge flow? To answer such questions we: a) create a very large and frequently updated collection of version control data in the entire FLOSS ecosystems named World of Code (WoC), that can completely cross-reference authors, projects, commits, blobs, dependencies, and history of the FLOSS ecosystems and b) provide capabilities to efficiently correct, augment, query, and analyze that data. Our current WoC implementation is capable of being updated on a monthly basis and contains over 18B Git objects. To evaluate its research potential and to create vignettes for its usage, we employ WoC in conducting several research tasks. In particular, we find that it is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage. We expect WoC to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem. Our infrastructure facilitates the discovery of key technical dependencies, code flow, and social networks that provide the basis to determine the structure and evolution of the relationships that drive FLOSS activities and innovation.
机译:开源软件(OSS)对现代社会至关重要,虽然对个人(通常是中央)项目进行了实质性研究,但仅对整个OSS生态系统的外围有限了解。例如,通过技术依赖性,代码共享或知识流程互联的外围中数百万个项目如何?要回答此类问题:a)在命名为代码(WOC)世界的整个牙线生态系统中,创建一个非常大而更新的版本控制数据集合,可以完全交叉引用作者,项目,提交,诸多作者,项目,提交,Blob,依赖关系和牙线生态系统和B)的历史提供有效地纠正,延伸,查询和分析该数据的能力。我们目前的WOC实现能够按月进行更新,并包含超过18B个GIT对象。为了评估其研究潜力并为其使用,我们雇用WOC进行几项研究任务。特别是,我们发现它能够支持趋势评估,生态系统测量和包装使用的确定。我们预计WOC将对OSS发展的全球性质进行调查,从而提高整个OSS生态系统的弹性。我们的基础架构有助于发现关键的技术依赖性,代码流和社交网络,为确定推动牙线活动和创新的关系的结构和演变。

著录项

  • 来源
    《Empirical Software Engineering》 |2021年第2期|22.1-22.42|共42页
  • 作者单位

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

    Carnegie Mellon Univ Inst Software Res Pittsburgh PA 15213 USA;

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

    Carnegie Mellon Univ Inst Software Res Pittsburgh PA 15213 USA;

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

    Univ Tennessee Dept Business Analyt & Stat Knoxville Stokely Management Ctr 916 Volunteer Bl Knoxville TN 37916 USA;

    Univ Tennessee Dept Elect Engn & Comp Sci Knoxville Min H Kao Bldg Room 619 1520 Middle Dr Knoxville TN 37996 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Software mining; Software supply chain; Software ecosystem;

    机译:软件挖掘;软件供应链;软件生态系统;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号