【24h】

Manhattan Siamese LSTM for Question Retrieval in Community Question Answering

机译:曼哈顿连体LSTM用于社区问答中的问题检索

获取原文

摘要

Community Question Answering (cQA) are platforms where users can post their questions, expecting for other users to provide them with answers. We focus on the task of question retrieval in cQA which aims to retrieve previous questions that are similar to new queries. The past answers related to the similar questions can be therefore used to respond to the new queries. The major challenges in this task are the shortness of the questions and the word mismatch problem as users can formulate the same query using different wording. Although question retrieval has been widely studied over the years, it has received less attention in Arabic and still requires a non trivial endeavour. In this paper, we focus on this task both in Arabic and English. We propose to use word embeddings, which can capture semantic and syntactic information from contexts, to vectorize the questions. In order to get longer sequences, questions are expanded with words having close word vectors. The embedding vectors are fed into the Siamese LSTM model to consider the global context of questions. The similarity between the questions is measured using the Manhattan distance. Experiments on real world Yahoo! Answers dataset show the efficiency of the method in Arabic and English.
机译:社区问题解答(cQA)是平台,用户可以在其中发布问题,并期望其他用户向他们提供答案。我们专注于cQA中的问题检索任务,该任务旨在检索与新查询相似的先前问题。因此,与类似问题相关的过去答案可以用于回答新查询。该任务的主要挑战是问题的简短性和单词不匹配问题,因为用户可以使用不同的措词来表述相同的查询。尽管多年来对问题检索进行了广泛的研究,但是阿拉伯语对问题检索的关注较少,仍然需要不懈的努力。在本文中,我们将重点放在阿拉伯语和英语上。我们建议使用单词嵌入,可以从上下文中捕获语义和句法信息,以对问题进行矢量化处理。为了获得更长的序列,用具有接近的单词向量的单词扩展问题。嵌入向量被输入到Siamese LSTM模型中,以考虑问题的全局上下文。使用曼哈顿距离来衡量问题之间的相似性。在真实世界的Yahoo!上进行实验答案数据集以阿拉伯语和英语显示该方法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号