首页> 外文期刊>Procedia Computer Science >Enhancing Question Retrieval in Community Question Answering Using Word Embeddings
【24h】

Enhancing Question Retrieval in Community Question Answering Using Word Embeddings

机译:使用词嵌入法增强社区问题解答中的问题检索

获取原文
       

摘要

Community Question Answering (CQA) services have evolved into a popular way of online information seeking, where users can interact and exchange knowledge in the form of questions and answers. In this paper, we study the problem of finding historical questions that are semantically equivalent to the queried ones, assuming that the answers to the similar questions should also answer the new ones. The major challenge of question retrieval is the word mismatch problem between questions, as users can formulate the same question using different wording. Most existing methods measure the similarity between questions based on the bag-of-words (BOWs) representation capturing no semantics between words. Therefore, this study proposes to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. The questions are clustered using Kmeans to speed up the search and ranking tasks. The similarity between the questions is measured using cosine similarity based on their weighted continuous valued vectors. We run our experiments on real world data set from Yahoo! Answers in English and Arabic to show the efficiency and generality of our proposed method.
机译:社区问答(CQA)服务已发展成一种流行的在线信息搜索方式,用户可以在其中以问答形式进行交互和交换知识。在本文中,我们假设在语义上等同于所查询问题的历史问题的查找问题,假设类似问题的答案也应回答新问题。问题检索的主要挑战是问题之间的词不匹配问题,因为用户可以使用不同的措词来表述相同的问题。现有的大多数方法都是基于词袋(BOW)表示法来度量问题之间的相似性,而这些词袋表示法在词之间没有语义。因此,本研究建议使用词嵌入,可以从上下文中捕获语义和句法信息,以对问题进行矢量化处理。使用Kmeans对问题进行聚类,以加快搜索和排名任务。问题之间的相似性是基于它们的加权连续值向量使用余弦相似性来衡量的。我们对Yahoo!的真实数据集进行了实验。用英语和阿拉伯语回答,说明我们提出的方法的效率和普遍性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号