Duplicate Detection in Programming Question Answering Communities

Zhang Wei Emma; Sheng Quan Z.; Lau Jey Han; Abebe Ermyas; Ruan Wenjie

首页> 外文期刊>ACM Transactions on Internet Technology >Duplicate Detection in Programming Question Answering Communities

【24h】

Duplicate Detection in Programming Question Answering Communities

机译：编程问题的重复检测回答社区

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Community-based Question Answering (CQA) websites are attracting increasing numbers of users and contributors in recent years. However, duplicate questions frequently occur in CQA websites and are currently manually identified by the moderators. Automatic duplicate detection, on one hand, alleviates this laborious effort for moderators before taking close actions, and, on the other hand, helps question issuers quickly find answers. A number of studies have looked into related problems, but very limited works target Duplicate Detection in Programming CQA (PCQA), a branch of CQA that is dedicated to programmers. Existing works framed the task as a supervised learning problem on the question pairs and relied on only textual features. Moreover, the issue of selecting candidate duplicates from large volumes of historical questions is often unaddressed. To tackle these issues, we model duplicate detection as a two-stage "ranking-classification" problem over question pairs. In the first stage, we rank the historical questions according to their similarities to the newly issued question and select the top ranked ones as candidates to reduce the search space. In the second stage, we develop novel features that capture both textual similarity and latent semantics on question pairs, leveraging techniques in deep learning and information retrieval literature. Experiments on real-world questions about multiple programming languages demonstrate that our method works very well; in some cases, up to 25% improvement compared to the state-of-the-art benchmarks.

机译：基于社区的问题应答（CQA）网站近年来吸引了越来越多的用户和贡献者。但是，在CQA网站中经常发生重复问题，当前由主持人手动标识。自动重复检测一方面，在采取密切行动之前减轻了主持人的这种费力努力，另一方面，帮助问题发行人迅速找到答案。许多研究表明了相关问题，但是在编程CQA（PCQA）中的工作重复检测非常有限，CQA的分支专用于程序员。现有的作品将任务构成为问题对中的监督学习问题，并仅依赖于文本功能。此外，从大量的历史问题中选择候选人重复的问题通常是不合适的。为了解决这些问题，我们将重复检测模型作为问题对作为两阶段“排名分类”问题。在第一阶段，我们根据他们的相似性对新发出的问题进行排名，并选择作为候选人的顶级排名的问题，以减少搜索空间。在第二阶段，我们开发了在问题对上捕获文本相似性和潜在语义的新功能，利用深度学习和信息检索文献中的技术。关于多种编程语言的现实世界问题的实验表明我们的方法很好;在某些情况下，与最先进的基准相比，高达25％的改进。

著录项

来源
《ACM Transactions on Internet Technology》 |2018年第3期|共21页
作者
Zhang Wei Emma; Sheng Quan Z.; Lau Jey Han; Abebe Ermyas; Ruan Wenjie;
展开▼
作者单位

Macquarie Univ Dept Comp Sydney NSW 2109 Australia;

Macquarie Univ Dept Comp Sydney NSW 2109 Australia;

Univ Melbourne Dept Comp &

Informat Syst Melbourne Vic 3010 Australia;

IBM Res Australia 204 Lygon St Melbourne Vic 3053 Australia;

Univ Oxford Dept Comp Sci Parks Rd Oxford OX1 3QD England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Community-based question answering; question quality; classification; latent semantics; association rules;

机译：基于社区的问题回答;质量;分类;潜在语义;关联规则;

相似文献

外文文献
中文文献
专利

1. Duplicate Detection in Programming Question Answering Communities [J] . Zhang Wei Emma, Sheng Quan Z., Lau Jey Han, ACM Transactions on Internet Technology . 2018,第3期

机译：编程问题的重复检测回答社区
2. Optimal answerer ranking for new questions in community question answering [J] . Zhenlei Yan, Jie Zhou Information Processing & Management . 2015,第1期

机译：社区问答中新问题的最佳回答者排名
3. Why users keep answering questions in online question answering communities: A theoretical and empirical investigation [J] . Xiao-Ling Jin, Zhongyun Zhou, Matthew K.O. Lee, International Journal of Information Management . 2013,第1期

机译：用户为何在在线问答社区中继续回答问题：理论和实证研究
4. Exploiting Salient Patterns for Question Detection and Question Retrieval in Community-based Question Answering [C] . KaiWang, Tat-Seng Chua 23rd International conference on computational linguistics, Proceedings of the 2nd workshop on the people's web meets NLP:collaboratively constructed semantic resources. . 2010

机译：在基于社区的问答中利用显着模式进行问题检测和问题检索
5. Automatic Neural Question Generation Using Community-Based Question Answering Systems [D] . Baghaee, Tina. 2018

机译：使用基于社区的问题应答系统的自动神经问题
6. Question Popularity Analysis and Prediction in Community Question Answering Services [O] . Ting Liu, Wei-Nan Zhang, Liujuan Cao, -1

机译：社区问答服务中的问题流行度分析与预测
7. Using Gaze Tracking to Tackle Duplicate Questions on Community Based Question Answering Websites: A Case Study of Ifixit [O] . Pankti Gandhi -1

机译：使用Gaze跟踪在基于社区的问题应答网站上解决重复问题：IFIXIT的案例研究
8. Answers to Questions about Substantially Damaged Buildings. National Flood Insurance Program Community Assistance Series. [R] . 1991

机译：关于大幅损坏建筑物的问题的答案。国家洪水保险计划社区援助系列。

Duplicate Detection in Programming Question Answering Communities

摘要

著录项

相似文献

相关主题

期刊订阅