首页> 外文会议>IEEE international conference on data engineering >Inferencing in information extraction: Techniques and applications
【24h】

Inferencing in information extraction: Techniques and applications

机译:信息提取中推感器:技术和应用

获取原文

摘要

Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating knowledge in their search results a couple of years ago [1]. Users increasingly expect structured knowledge as answers to their search needs. Using Bing as an example, the result page for “Lionel Messi” is full of structured knowledge facts, such as his birthday and awards. The research efforts towards improving the accuracy and coverage of such knowledge bases have led to significant advances in Information Extraction techniques [2], [3]. As the initial challenge of accurately extracting facts for popular entities are being addressed, more difficult challenges have emerged such as extending knowledge coverage to long tail entities and domains, understanding interestingness and usefulness of facts within a given context, and addressing information-seeking needs more directly and accurately. In this tutorial, we will survey the recent research efforts and provide an introduction to the techniques that address those challenges, and the applications that benefit from the adoption of those techniques. In particular, this tutorial will focus on a variety of techniques that can be broadly viewed as knowledge inferencing, i.e., combining multiple data sources and extraction techniques to verify existing knowledge and derive new knowledge. More specifically, we focus on four main categories of inferencing techniques: 1) deep natural language processing using machine learning techniques, 2) data cleaning using integrity constraints, 3) large-scale probabilistic reasoning, and 4) leveraging human expertise for domain knowledge extraction.
机译:网络规模的信息提取已成为数据管理中最重要的研究主题之一,因为主要商业搜索引擎在几年前开始在搜索结果中纳入知识[1]。用户越来越希望结构化知识作为搜索需求的答案。使用Bing作为示例,“Lionel Messi”的结果页面充满了结构化知识事实,例如他的生日和奖项。提高这些知识库的准确性和覆盖率的研究努力导致信息提取技术的显着进展[2],[3]。由于准确提取了对流行实体的事实的初步挑战,因此出现了更加困难的挑战,例如将知识覆盖范围扩展到长尾实体和域名,了解在给定的背景中的事实的有趣和实用性,并更多地解决信息需求需求更多直接准确。在本教程中,我们将调查最近的研究努力,并提供解决这些挑战的技术的介绍,以及从采用这些技术中受益的应用程序。特别是,本教程将专注于各种技术,这些技术可以广泛地被视为知识推理,即,组合多个数据源和提取技术来验证现有的知识并导出新知识。更具体地说,我们专注于四个主要类别的推理技术:1)深度自然语言处理使用机器学习技术,2)数据清洁使用完整性约束,3)大规模概率推理,4)利用人类专业知识来实现​​域知识提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号