...
首页> 外文期刊>ACM SIGIR FORUM >Enabling Entity Retrieval by Exploiting Wikipedia as a Semantic Knowledge Source
【24h】

Enabling Entity Retrieval by Exploiting Wikipedia as a Semantic Knowledge Source

机译:通过利用Wikipedia作为语义知识源来启用实体检索

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This dissertation research, PanAnthropon FilmWorld, aims to demonstrate direct retrieval of entities and related facts by exploiting Wikipedia as a semantic knowledge source, with the film domain as its proof-of-concept domain of application. To this end, a semantic knowledge base concerning the film domain has been constructed with the data extracted/derived from 10,640 Wikipedia pages on films and additional pages on film awards. The knowledge base currently contains 209,266 entities and 2,345,931 entity-centric facts. Both the knowledge base and the corresponding semantic search interface are based on the coherent classification of entities. Entity-centric facts are also consistently represented as tuples. The semantic search interface supports multiple types of semantic search functions, which go beyond the traditional keyword-based search function, including the main General Entity Retrieval Query (GERQ) function, which is concerned with retrieving all entities that match the specified entity type, subtype, and semantic conditions and thus corresponds to the main research problem. Two types of evaluation have been performed in order to evaluate (1) the quality of information extraction and (2) the effectiveness of information retrieval using the semantic interface. The first type of evaluation has been performed by inspecting 11,495 film-centric facts concerning 100 films. The results have confirmed high data quality with 99.96% average precision and 99.84% average recall. The second type of evaluation has been performed by conducting an experiment with human subjects. The experiment involved having the subjects perform a retrieval task by using both the PanAnthropon interface and the Internet Movie Database (IMDb) interface and comparing their task performance between the two interfaces. The results have confirmed higher effectiveness of the PanAnthropon interface vs. the EVIDb interface (83.11% vs. 40.78% average precision; 83.55% vs. 40.26% average recall). Moreover, the subjects' responses to the post-task questionnaire indicate that the subjects found the PanAnthropon interface to be highly usable and easily understandable as well as highly effective. The main contribution from this research therefore consists in achieving the set research goal, namely, demonstrating the utility and feasibility of semantics-based direct entity retrieval.
机译:本论文的研究对象是PanAnthropon FilmWorld,其目的是通过利用Wikipedia作为语义知识源,并以电影领域作为其概念验证的应用领域,来演示对实体和相关事实的直接检索。为此,已经构建了一个关于电影领域的语义知识库,其中包含了从10640个电影的Wikipedia页面和电影奖项的其他页面中提取/获得的数据。知识库当前包含209,266个实体和2,345,931个以实体为中心的事实。知识库和相应的语义搜索界面均基于实体的一致分类。以实体为中心的事实也始终表示为<实体,属性,值,注释>元组。语义搜索界面支持多种类型的语义搜索功能,这些功能超出了传统的基于关键字的搜索功能,包括主要的通用实体检索查询(GERQ)功能,该功能涉及检索与指定实体类型,子类型匹配的所有实体。 ,以及语义条件,因此与主要研究问题相对应。为了评估(1)信息提取的质量和(2)使用语义接口的信息检索的有效性,已经进行了两种类型的评估。第一类评估是通过检查涉及100部电影的11,495部以电影为中心的事实进行的。结果证实了高数据质量,平均精度为99.96%,召回率平均为99.84%。第二类评估是通过对人类受试者进行实验来进行的。该实验涉及让受试者通过使用PanAnthropon接口和Internet电影数据库(IMDb)接口执行检索任务,并比较这两个接口之间的任务执行情况。结果证实了PanAnthropon界面比EVIDb界面具有更高的有效性(平均精度分别为83.11%和40.78%;平均召回率是83.55%和40.26%)。此外,受试者对任务后问卷的回答表明,受试者发现PanAnthropon界面高度可用,易于理解并且非常有效。因此,这项研究的主要贡献在于实现了既定的研究目标,即证明了基于语义的直接实体检索的实用性和可行性。

著录项

  • 来源
    《ACM SIGIR FORUM》 |2012年第1期|p.80|共1页
  • 作者

    Sofia J. Athenikos;

  • 作者单位

    College of Info Science and Technology Drexel University Philadelphia, PA, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号