首页> 外文期刊>ACM transactions on software engineering and methodology >Generating Question Titles for Stack Overflow from Mined Code Snippets
【24h】

Generating Question Titles for Stack Overflow from Mined Code Snippets

机译:生成堆栈溢出的问题标题,来自挖掘代码片段

获取原文
获取原文并翻译 | 示例
           

摘要

Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow up work.
机译:STACK OVERFLOW由软件开发人员普遍使用,作为通过Internet从对等体寻求编程相关信息的流行方式。堆栈溢出社区建议用户在创建问题时提供相关代码片段,以帮助其他人更好地了解并提供他们的帮助。以前的研究表明,其中大量这些问题具有低质量,并且对堆栈溢出中的其他潜在专家没有吸引力。这些令人难度的问题不太可能接受有用的答案并阻碍整体知识生成和共享过程。考虑到引入低质量问题的原因之一是,由于缺乏与问题相关的知识和术语,以及/或其相关的知识和术语,许多开发人员可能无法澄清和总结其所提出的代码片段背后的关键问题在本研究中,写作技巧不佳,我们提出了一种方法来帮助开发人员通过使用深序序列到序列学习方法自动为代码片段生成问题标题来编写高质量问题。我们的方法是完全数据驱动的,并使用注意机制来执行更好的内容选择,是处理稀有词问题的复制机制和消除词重复问题的覆盖机制。我们评估我们在各种编程语言上的堆栈溢出数据集(例如,Python,Java,JavaScript,C#和SQL)以及我们的实验结果表明,我们的方法显着优于自动和自动的最先进的基本线人体评估。我们发布了我们的代码和数据集,以促进其他研究人员验证他们的想法并激励跟进工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号