...
首页> 外文期刊>Quality Control, Transactions >Duplicate Question Detection With Deep Learning in Stack Overflow
【24h】

Duplicate Question Detection With Deep Learning in Stack Overflow

机译:堆栈溢出中深入学习的重复问题检测

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Stack Overflow is a popular Community-based Question Answer (CQA) website focused on software programming and has attracted more and more users in recent years. However, duplicate questions frequently appear in Stack Overflow and they are manually marked by the users with high reputation. Automatic duplicate question detection alleviates labor and effort for users with high reputation. Although existing approaches extract textual features to automatically detect duplicate questions, these approaches are limited since semantic information could be lost. To tackle this problem, we explore the use of powerful deep learning techniques, including Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM), to detect duplicate questions in Stack Overflow. In addition, we use Word2Vec to obtain the vector representations of words. They can fully capture semantic information at document-level and word-level respectively. Therefore, we construct three deep learning approaches WV-CNN, WV-RNN and WV-LSTM, which are based on Word2Vec, CNN, RNN and LSTM, to detect duplicate questions in Stack Overflow. Evaluation results show that WV-CNN and WV-LSTM have made significant improvements over four baseline approaches (i.e., DupPredictor, Dupe, DupPredictorRep-T, and DupeRep) and three deep learning approaches (i.e., DQ-CNN, DQ-RNN, and DQ-LSTM) in terms of recall-rate & x0040;5, recall-rate & x0040;10 and recall-rate & x0040;20. Furthermore, the experimental results indicate that our approaches WV-CNN, WV-RNN, and WV-LSTM outperform four machine learning approaches based on Support Vector Machine, Logic Regression, Random Forest and eXtreme Gradient Boosting in terms of recall-rate & x0040;5, recall-rate & x0040;10 and recall-rate & x0040;20.
机译:堆栈溢出是一个受欢迎的社区问题答案(CQA)网站专注于软件编程,近年来吸引了越来越多的用户。但是,重复的问题经常出现在堆栈溢出中,并且它们是由具有很高信誉的用户手动标记。自动重复的问题检测减轻了高声誉的用户的劳动力和努力。虽然现有方法提取文本功能以自动检测重复问题,但这些方法是有限的,因为语义信息可能会丢失。为了解决这个问题,我们探讨了强大的深度学习技术,包括卷积神经网络(CNN),经常性神经网络(RNN)和长短期内存(LSTM),以检测堆栈溢出中的重复问题。此外,我们使用Word2VEC获取单词的矢量表示。它们可以分别在文档级和字级别捕获语义信息。因此,我们构建了三个深度学习方法WV-CNN,WV-RNN和WV-LSTM,其基于Word2VEC,CNN,RNN和LSTM,以检测堆栈溢出中的重复问题。评估结果表明,WV-CNN和WV-LSTM通过四种基线方法(即DUPPREDICTOR,DUPE,DUPPREDICTOREP-T和DUPEREP)和三种深度学习方法(即DQ-CNN,DQ-RNN和和DQ-LSTM)在召回率和X0040方面; 5,召回率和X0040; 10和召回率和x0040; 20。此外,实验结果表明,我们的方法WV-CNN,WV-RNN和WV-LSTM优于四种机器学习方法,基于支持向量机,逻辑回归,随机林和极端梯度提高召回率和X0040; 5,召回率和x0040; 10并召回率和x0040; 20。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号