On using Stack Overflow comment-edit pairs to recommend code maintenance changes

Henry Tang; Sarah Nadi

首页> 外文期刊>Empirical Software Engineering >On using Stack Overflow comment-edit pairs to recommend code maintenance changes

【24h】

On using Stack Overflow comment-edit pairs to recommend code maintenance changes

机译：使用堆栈溢出注释 - 编辑对建议代码维护更改

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Code maintenance data sets typically consist of a before version of the code and an after version that contains the improvement or fix. Such data sets are important for various software engineering support tools related to code maintenance, such as program repair, code recommender systems, or Application Programming Interface (API) misuse detection. Most of the current data sets are typically constructed from mining commit history in version-control systems or issues in issue-tracking systems. In this paper, we investigate whether Stack Overflow can be used as an additional source for building code maintenance data sets. Comments on Stack Overflow provide an effective way for developers to point out problems with existing answers, alternative solutions, or pitfalls. Given its crowd-sourced nature, answers are then updated to incorporate these suggestions. In this paper, we mine comment-edit pairs from Stack Overflow and investigate their potential usefulness for constructing the above data sets. These comment-edit pairs have the added benefit of having concrete descriptions/explanations of why the change is needed as well as potentially having less tangled changes to deal with. We first design a technique to extract related comment-edit pairs and then qualitatively and quantitatively investigate the nature of these pairs. We find that the majority of comment-edit pairs are not tangled, but find that only 27% of the studied pairs are potentially useful for the above applications. We categorize the types of mined pairs and find that the highest ratio of useful pairs come from those categorized as Correction, Obsolete, Flaw, and Extension. These categories can provide data for both corrective and preventative maintenance activities. To demonstrate the effectiveness of our extracted pairs, we submitted 15 pull requests to popular GitHub repositories, 10 of which have been accepted to widely used repositories such as Apache Beam (https://beam.apache.org/) and NLTK (https://www.nltk.org/). Our work is the first to investigate Stack Overflow comment-edit pairs and opens the door for future work in this direction. Based on our findings and observations, we provide concrete suggestions on how to potentially identify a larger set of useful comment-edit pairs, which can also be facilitated by our shared data.

机译：代码维护数据集通常由前一个代码版本和包含改进或修复的后版本组成。这种数据集对于与代码维护相关的各种软件工程支持工具非常重要，例如程序修复，代码推荐系统或应用程序编程接口（API）误用检测。大多数当前数据集通常由版本控制系统中的挖掘提交历史或问题跟踪系统中的问题构成。在本文中，我们调查堆栈溢出是否可以用作构建代码维护数据集的附加源。堆栈溢出的评论为开发人员提供了一种有效的方法，以指出现有答案，替代解决方案或陷阱的问题。鉴于其人群源性，然后更新答案以纳入这些建议。在本文中，我们从堆栈溢出中发出评论 - 编辑对，并调查它们对构建上述数据集的潜在有用性。这些评论 - 编辑对具有具有具体描述/解释为什么需要更改的具体描述/解释以及潜在的纠结更改来处理。我们首先设计一种提取相关评论编辑对的技术，然后定性和定量地调查这些对的性质。我们发现大多数评论 - 编辑对并不纠结，但发现只有27％的研究对可能对上述应用有用。我们对所挖掘的类型进行分类，并发现有用对的最高比率来自分类为校正，过时，缺陷和扩展。这些类别可以为纠正和预防性维护活动提供数据。为了展示我们提取的对的有效性，我们向流行的GitHub存储库提交了15个提取请求，其中10个已被接受广泛使用Apache波束（https:/beam.apache.org/）和nltk（https： //www.nltk.org/）。我们的作品是第一个调查堆栈溢出评论 - 编辑对，并在此方向上打开了未来工作的门。根据我们的调查结果和观察，我们提供了关于如何潜在地标识更大的有用评论编辑对的具体建议，也可以通过我们的共享数据促进。

著录项

来源
《Empirical Software Engineering》 |2021年第4期|68.1-68.35|共35页
作者
Henry Tang; Sarah Nadi;
展开▼
作者单位

University of Alberta Edmonton Canada;

University of Alberta Edmonton Canada;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Stack Overflow; Comment-edit pairs; Bug-fix data sets;

机译：堆栈溢出;注释 - 编辑对;错误 - 修复数据集;

相似文献

外文文献
中文文献
专利

1. Smart fuzzing method for detecting stack-based buffer overflow in binary codes [J] . Maryam Mouzarani, Babak Sadeghiyan, Mohammad Zolfaghari Software, IET . 2016,第4期

机译：用于检测二进制代码中基于堆栈的缓冲区溢出的智能模糊方法
2. Bug severity prediction using question-and-answer pairs from Stack Overflow [J] . Youshuai Tan, Sijie Xu, Zhaowei Wang, The Journal of Systems and Software . 2020,第Jula期

机译：使用来自堆栈溢出的问题和答案对的错误严重性预测
3. Toxic Code Snippets on Stack Overflow [J] . Ragkhitwetsagul Chaiyong, Krinke Jens, Paixao Matheus, IEEE Transactions on Software Engineering . 2021,第3期

机译：堆栈溢出的有毒代码片段
4. Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow [C] . Pengcheng Yin, Bowen Deng, Edgar Chen, IEEE/ACM International Conference on Mining Software Repositories . 2018

机译：从堆栈溢出中学习对齐的代码和自然语言对
5. Study of Outdated Cryptography Algorithms Posts of Stack Overflow [D] . Kharche, Shraddha. 2021

机译：堆栈溢流过期加密算法的研究
6. Stacking of Crick Wobble pair and Watson-Crick pair: stability rules of G-U pairs at ends of helical stems in tRNAs and the relation to codon-anticodon Wobble interaction. [O] . H Mizuno, M Sundaralingam 1978

机译：Crick摆动对和Watson-Crick对的堆叠：tRNA螺旋茎末端的G-U对的稳定性规则以及与密码子-反密码子相互作用的关系。
7. Learning to mine aligned code and natural language pairs from stack overflow [O] . Pengcheng Yin, Bowen Deng, Edgar Chen, 2018

机译：学习挖掘堆栈溢出的对齐代码和自然语言对

On using Stack Overflow comment-edit pairs to recommend code maintenance changes

摘要

著录项

相似文献

相关主题

期刊订阅