Duplicate Question Detection With Deep Learning in Stack Overflow

Wang Liting; Zhang Li; Jiang Jing

首页> 外文期刊>Quality Control, Transactions >Duplicate Question Detection With Deep Learning in Stack Overflow

【24h】

Duplicate Question Detection With Deep Learning in Stack Overflow

机译：堆栈溢出中深入学习的重复问题检测

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stack Overflow is a popular Community-based Question Answer (CQA) website focused on software programming and has attracted more and more users in recent years. However, duplicate questions frequently appear in Stack Overflow and they are manually marked by the users with high reputation. Automatic duplicate question detection alleviates labor and effort for users with high reputation. Although existing approaches extract textual features to automatically detect duplicate questions, these approaches are limited since semantic information could be lost. To tackle this problem, we explore the use of powerful deep learning techniques, including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM), to detect duplicate questions in Stack Overflow. In addition, we use Word2Vec to obtain the vector representations of words. They can fully capture semantic information at document-level and word-level respectively. Therefore, we construct three deep learning approaches WV-CNN, WV-RNN and WV-LSTM, which are based on Word2Vec, CNN, RNN and LSTM, to detect duplicate questions in Stack Overflow. Evaluation results show that WV-CNN and WV-LSTM have made significant improvements over four baseline approaches (i.e., DupPredictor, Dupe, DupPredictorRep-T, and DupeRep) and three deep learning approaches (i.e., DQ-CNN, DQ-RNN, and DQ-LSTM) in terms of recall-rate & x0040;5, recall-rate & x0040;10 and recall-rate & x0040;20. Furthermore, the experimental results indicate that our approaches WV-CNN, WV-RNN, and WV-LSTM outperform four machine learning approaches based on Support Vector Machine, Logic Regression, Random Forest and eXtreme Gradient Boosting in terms of recall-rate & x0040;5, recall-rate & x0040;10 and recall-rate & x0040;20.

机译：堆栈溢出是一个受欢迎的社区问题答案（CQA）网站专注于软件编程，近年来吸引了越来越多的用户。但是，重复的问题经常出现在堆栈溢出中，并且它们是由具有很高信誉的用户手动标记。自动重复的问题检测减轻了高声誉的用户的劳动力和努力。虽然现有方法提取文本功能以自动检测重复问题，但这些方法是有限的，因为语义信息可能会丢失。为了解决这个问题，我们探讨了强大的深度学习技术，包括卷积神经网络（CNN），经常性神经网络（RNN）和长短期内存（LSTM），以检测堆栈溢出中的重复问题。此外，我们使用Word2VEC获取单词的矢量表示。它们可以分别在文档级和字级别捕获语义信息。因此，我们构建了三个深度学习方法WV-CNN，WV-RNN和WV-LSTM，其基于Word2VEC，CNN，RNN和LSTM，以检测堆栈溢出中的重复问题。评估结果表明，WV-CNN和WV-LSTM通过四种基线方法（即DUPPREDICTOR，DUPE，DUPPREDICTOREP-T和DUPEREP）和三种深度学习方法（即DQ-CNN，DQ-RNN和和DQ-LSTM）在召回率和X0040方面; 5，召回率和X0040; 10和召回率和x0040; 20。此外，实验结果表明，我们的方法WV-CNN，WV-RNN和WV-LSTM优于四种机器学习方法，基于支持向量机，逻辑回归，随机林和极端梯度提高召回率和X0040; 5，召回率和x0040; 10并召回率和x0040; 20。

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|25964-25975|共12页
作者
Wang Liting; Zhang Li; Jiang Jing;
展开▼
作者单位

Beihang Univ State Key Lab Software Dev Environm Beijing 100191 Peoples R China;

Beihang Univ State Key Lab Software Dev Environm Beijing 100191 Peoples R China;

Beihang Univ State Key Lab Software Dev Environm Beijing 100191 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Stack overflow; duplicate question detection; deep learning;

机译：堆栈溢出;重复的问题检测;深度学习;

相似文献

外文文献
中文文献
专利

1. Multi-Factor Duplicate Question Detection in Stack Overflow [J] . 张芸, David Lo, 夏鑫, 计算机科学技术学报（英文版） . 2015,第005期

机译：堆栈溢出中的多因素重复问题检测
2. Belief Measure of Expertise for Experts Detection in Question Answering Communities: case study Stack Overflow [J] . Dorra Attiaoui, Arnaud Martin, Boutheina Ben Yaghlane Procedia Computer Science . 2017,第1期

机译：问答社区中专家检测的专业知识信念度量：案例研究堆栈溢出
3. What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories [J] . Stefanie Beyer, Christian Macho, Massimiliano Di Penta, Empirical Software Engineering . 2020,第3期

机译：开发人员询问堆栈溢出是否有什么样的问题？自动化方法将帖子分类为问题类别的比较
4. Duplicate question detection in stack overflow: A reproducibility study [C] . Rodrigo F. G. Silva, Klérisson Paixão, Marcelo de Almeida Maia IEEE International Conference on Software Analysis, Evolution, and Reengineering . 2018

机译：堆栈溢出中的重复问题检测：可重复性研究
5. Context Based Multi-Image Visual Question Answering (VQA) in Deep Learning [D] . Peddinti, Sudhakar Reddy. 2018

机译：深度学习中基于上下文的多图像视觉问答（VQA）
6. COVID-19 detection using deep learning models to exploit Social Mimic Optimization and structured chest X-ray images using fuzzy color and stacking approaches [O] . Mesut Toğaçar, Burhan Ergen, Zafer Cömert -1

机译：使用深度学习模型进行COVID-19检测以利用模糊的颜色和堆叠方法利用社交模仿优化和结构化的胸部X射线图像
7. Duplicate Question Detection With Deep Learning in Stack Overflow [O] . Liting Wang, Li Zhang, Jing Jiang 2020

机译：堆栈溢出中深入学习的重复问题检测

Duplicate Question Detection With Deep Learning in Stack Overflow

摘要

著录项

相似文献

相关主题

期刊订阅