首页> 外文期刊>Empirical Software Engineering >On using Stack Overflow comment-edit pairs to recommend code maintenance changes
【24h】

On using Stack Overflow comment-edit pairs to recommend code maintenance changes

机译:使用堆栈溢出注释 - 编辑对建议代码维护更改

获取原文
获取原文并翻译 | 示例
           

摘要

Code maintenance data sets typically consist of a before version of the code and an after version that contains the improvement or fix. Such data sets are important for various software engineering support tools related to code maintenance, such as program repair, code recommender systems, or Application Programming Interface (API) misuse detection. Most of the current data sets are typically constructed from mining commit history in version-control systems or issues in issue-tracking systems. In this paper, we investigate whether Stack Overflow can be used as an additional source for building code maintenance data sets. Comments on Stack Overflow provide an effective way for developers to point out problems with existing answers, alternative solutions, or pitfalls. Given its crowd-sourced nature, answers are then updated to incorporate these suggestions. In this paper, we mine comment-edit pairs from Stack Overflow and investigate their potential usefulness for constructing the above data sets. These comment-edit pairs have the added benefit of having concrete descriptions/explanations of why the change is needed as well as potentially having less tangled changes to deal with. We first design a technique to extract related comment-edit pairs and then qualitatively and quantitatively investigate the nature of these pairs. We find that the majority of comment-edit pairs are not tangled, but find that only 27% of the studied pairs are potentially useful for the above applications. We categorize the types of mined pairs and find that the highest ratio of useful pairs come from those categorized as Correction, Obsolete, Flaw, and Extension. These categories can provide data for both corrective and preventative maintenance activities. To demonstrate the effectiveness of our extracted pairs, we submitted 15 pull requests to popular GitHub repositories, 10 of which have been accepted to widely used repositories such as Apache Beam (https://beam.apache.org/) and NLTK (https://www.nltk.org/). Our work is the first to investigate Stack Overflow comment-edit pairs and opens the door for future work in this direction. Based on our findings and observations, we provide concrete suggestions on how to potentially identify a larger set of useful comment-edit pairs, which can also be facilitated by our shared data.
机译:代码维护数据集通常由前一个代码版本和包含改进或修复的后版本组成。这种数据集对于与代码维护相关的各种软件工程支持工具非常重要,例如程序修复,代码推荐系统或应用程序编程接口(API)误用检测。大多数当前数据集通常由版本控制系统中的挖掘提交历史或问题跟踪系统中的问题构成。在本文中,我们调查堆栈溢出是否可以用作构建代码维护数据集的附加源。堆栈溢出的评论为开发人员提供了一种有效的方法,以指出现有答案,替代解决方案或陷阱的问题。鉴于其人群源性,然后更新答案以纳入这些建议。在本文中,我们从堆栈溢出中发出评论 - 编辑对,并调查它们对构建上述数据集的潜在有用性。这些评论 - 编辑对具有具有具体描述/解释为什么需要更改的具体描述/解释以及潜在的纠结更改来处理。我们首先设计一种提取相关评论编辑对的技术,然后定性和定量地调查这些对的性质。我们发现大多数评论 - 编辑对并不纠结,但发现只有27%的研究对可能对上述应用有用。我们对所挖掘的类型进行分类,并发现有用对的最高比率来自分类为校正,过时,缺陷和扩展。这些类别可以为纠正和预防性维护活动提供数据。为了展示我们提取的对的有效性,我们向流行的GitHub存储库提交了15个提取请求,其中10个已被接受广泛使用Apache波束(https:/beam.apache.org/)和nltk(https: //www.nltk.org/)。我们的作品是第一个调查堆栈溢出评论 - 编辑对,并在此方向上打开了未来工作的门。根据我们的调查结果和观察,我们提供了关于如何潜在地标识更大的有用评论编辑对的具体建议,也可以通过我们的共享数据促进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号