首页> 外文会议>IEEE International Conference on Software Maintenance and Evolution >A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub
【24h】

A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub

机译:大规模数据集和对基于Docker Hub托管的Docker映像的实证研究

获取原文

摘要

Docker is currently one of the most popular containerization solutions. Previous work investigated various characteristics of the Docker ecosystem, but has mainly focused on Dockerfiles from GitHub, limiting the type of questions that can be asked, and did not investigate evolution aspects. In this paper, we create a recent and more comprehensive data set by collecting data from Docker Hub, GitHub, and Bitbucket. Our data set contains information about 3,364,529 Docker images and 378,615 git repositories behind them. Using this data set, we conduct a large-scale empirical study with four research questions where we reproduce previously explored characteristics (e.g., popular languages and base images), investigate new characteristics such as image tagging practices, and study evolution trends. Our results demonstrate the maturity of the Docker ecosystem: we find more reliance on ready-to-use language and application base images as opposed to yet-to-be-configured OS images, a downward trend of Docker image sizes demonstrating the adoption of best practices of keeping images small, and a declining trend in the number of smells in Dockerfiles suggesting a general improvement in quality. On the downside, we find an upward trend in using obsolete OS base images, posing security risks, and find problematic usages of the latest tag, including version lagging. Overall, our results bring good news such as more developers following best practices, but they also indicate the need to build tools and infrastructure embracing new trends and addressing potential issues.
机译:Docker目前是最受欢迎的集装箱解决方案之一。以前的工作调查了Docker生态系统的各种特征,但主要集中在GitHub上的Dockerfiles上,限制了可以被问到的问题的类型,并没有调查演化方面。在本文中,我们通过从Docker Hub,GitHub和Bitbucket收集数据来创建最近和更全面的数据集。我们的数据集包含有关3,364,529个Docker图像和378,615个Git存储库的信息。使用此数据集,我们通过四个研究问题进行大规模的实证研究,我们重现了先前探索的特征(例如,流行语言和基本图像),调查图像标记实践等新特征,以及研究进化趋势等新特征。我们的结果展示了Docker生态系统的成熟:我们发现更多依赖即时使用语言和应用程序基础图像,而不是待配置的操作系统图像,Docker Image Size的下降趋势展示了最佳的采用保持图像的实践小,码头闻中的闻闻层的衰落趋势,表明质量一般提高。在缺点中,我们发现使用过时的OS基础图像,构成安全风险,并找到最新标签的问题使用情况,包括版本滞后。总体而言,我们的结果带来了良好的新闻,如更多的开发人员在最佳做法之后,但他们还表明需要建立工具和基础设施拥抱新趋势和解决潜在问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号