首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts
【24h】

SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts

机译:SOTorrent:重建和分析堆栈溢出帖子的演变

获取原文

摘要

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub.
机译:堆栈溢出(SO)是最受软件开发人员欢迎的问答网站,它提供了大量的代码片段和有关各种主题的自由格式文本。像其他软件工件一样,SO上的问题和答案会随着时间而发展,例如,当代码片段中的错误被修复,代码被更新以使用最新的库版本时,或者为了清晰起见,对代码片段的文本进行了编辑。为了能够分析SO内容的演变,我们构建了SOTorrent,这是一个基于官方SO数据转储的开放数据集。 SOTorrent在整个帖子以及单个文本或代码块的级别提供对SO内容的版本历史的访问。它通过聚集文本块中的URL以及通过从GitHub文件中收集对SO帖子的引用来将SO帖子连接到其他平台。在本文中,我们描述了如何构建SOTorrent,尤其是我们如何评估134个不同的字符串相似性度量标准,以了解它们在重构文本和代码块的版本历史记录中的适用性。在使用数据集进行的首次分析的基础上,我们提出了对SO帖子演变的见解,例如,帖子编辑通常很小,是在最初创建帖子后不久发生的,并且在不更新周围文本的情况下很少更改代码。此外,我们的分析揭示了帖子编辑和评论之间的密切关系。我们的愿景是研究人员将使用SOTorrent调查和了解SO帖子的演变以及它们与GitHub等其他平台的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号