首页> 外文会议>Proceedings of international conference on natural language processing and knowledge engineering >Searching Semantically Similar Questions from a Large Community-based Question Archive
【24h】

Searching Semantically Similar Questions from a Large Community-based Question Archive

机译:从大型的基于社区的问题档案库中搜索语义相似的问题

获取原文

摘要

This paper provides a novel and totally statistical method to search similar questions from a large question archive for a given queried question. Firstly, a word relevance model is trained based on the whole question archive which is made up of millions of natural language questions proposed by users on the web. The word relevance model is utilized to find most semantically related words to a specific word. Secondly, in order to find semantically similar questions for a queried question, each non-stop word in a question is expanded with the help of word relevance model and represented as a word vector. Elements of the vector include the word itself and some semantically related words to it. Elements of the word vector are weighted by combining both classical IR term weighting method and word transformation probability learned from the relevance model. Then the question is mapped to a question vector as the normalized center of the word vectors representing these words contained in it. The problem of question retrieval can be solved by comparing the similarity between question vectors. The method is actually a simple question expansion based Kernel approach. Experimental results indicate the proposed method outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).
机译:本文提供了一种新颖的,完全统计的方法,可以从大型问题档案中搜索给定查询问题的相似问题。首先,基于整个问题档案库训练单词相关性模型,该档案库由用户在网络上提出的数百万种自然语言问题组成。单词相关性模型用于查找与特定单词最语义相关的单词。其次,为了找到所查询问题的语义相似问题,借助单词相关性模型扩展问题中的每个不停词,并将其表示为单词向量。向量的元素包括单词本身和与其相关的一些语义相关单词。通过结合经典IR项加权方法和从相关性模型中学到的词转换概率,对词向量的元素进行加权。然后,将问题映射到问题向量,作为表示包含在其中的这些单词的单词向量的归一化中心。可以通过比较问题向量之间的相似性来解决问题检索的问题。该方法实际上是一个基于问题扩展的简单内核方法。实验结果表明,该方法优于矢量空间模型(VSM)和信息检索语言模型(LMIR)等基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号