【24h】

Semantic Question Answering Using Wikipedia Categories Clustering

机译:使用Wikipedia类别聚类的语义问题回答

获取原文
获取原文并翻译 | 示例
           

摘要

We describe a system that performs semantic Question Answering based on the combination of classic Information Retrieval methods with semantic ones. First, we use a search engine to gather web pages and then apply a noun phrase extractor to extract all the candidate answer entities from them. Candidate entities are ranked using a linear combination of two IR measures to pick the most relevant ones. For each one of the top ranked candidate entities we find the corresponding Wikipedia page. We then propose a novel way to exploit Semantic Information contained in the structure of Wikipedia. A vector is built for every entity from Wikipedia category names by splitting and lemmatizing the words that form them. These vectors maintain Semantic Information in the sense that we are given the ability to measure semantic closeness between the entities. Based on this, we apply an intelligent clustering method to the candidate entities and show that candidate entities in the biggest cluster are the most semantically related to the ideal answers to the query. Results on the topics of the TREC 2009 Related Entity Finding task dataset show promising performance.
机译:我们描述了一种基于经典信息检索方法与语义方法相结合的执行语义问题回答的系统。首先,我们使用搜索引擎来收集网页,然后应用名词短语提取器从中提取所有候选答案实体。使用两个IR指标的线性组合对候选实体进行排名,以选择最相关的实体。对于排名最高的候选实体中的每个实体,我们都找到相应的Wikipedia页面。然后,我们提出了一种新颖的方式来利用Wikipedia结构中包含的语义信息。通过对构成实体的单词进行拆分和词法化,为Wikipedia类别名称中的每个实体构建一个向量。这些向量从某种意义上说维护了语义信息,因为我们有能力测量实体之间的语义紧密度。基于此,我们将智能聚类方法应用于候选实体,并显示最大聚类中的候选实体在语义上与查询的理想答案关系最密切。 TREC 2009相关实体查找任务数据集主题的结果显示出令人鼓舞的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号