首页> 外文学位 >Unsupervised discovery of extraction patterns for information extraction.
【24h】

Unsupervised discovery of extraction patterns for information extraction.

机译:信息提取的提取模式的无监督发现。

获取原文
获取原文并翻译 | 示例

摘要

The task of Information Extraction (IE) is to find specific types of information in natural language text. In particular, event extraction identifies instances of a particular type of event or fact (a particular "scenario"), including the entities involved, and fills a database which has been pre-defined for the scenario. As the number of documents available on-line has multiplied, entity extraction has grown in importance for various applications, including tracking terrorist activities from newswire sources and building a database of job postings from the Web, to name a few.;Linguistic contexts, such as predicate-argument relationships, have been widely used as extraction patterns to identify the items to be extracted from the text. The cost of creating extraction patterns for each scenario has been a bottleneck limiting the portability of information extraction systems to different scenarios, although there has been some research on semi-supervised pattern discovery procedures to reduce this cost. The challenge is to develop a fully automatic method for identifying extraction patterns for a scenario specified by the user.;This dissertation presents a novel approach for the unsupervised discovery of extraction patterns for event extraction from raw text. First, we present a framework that allows the user to have a self-customizing information ex traction system for his/her query: the Query-Driven Information Extraction (QDIE) framework. The input to the QDIE framework is the user's query: either a set of keywords or a narrative description of the event extraction task.;Second, we assess the improvement in extraction pattern models. By considering the shortcomings of the prior work based on predicate-argument models and their extensions, we propose a novel extraction pattern model that is based on arbitrary subtrees of dependency trees.;Third, we address the issue of portability across languages. As a case study of the QDIE framework, we implemented a pre-CODIE system, a Cross-Lingual On-Demand Information Extraction system requiring minimal human intervention, which incorporates the QDIE framework as a component for pattern discovery. In addition, we assess the role of machine translation in cross-lingual information extraction by comparing translation-based implementations.
机译:信息提取(IE)的任务是在自然语言文本中查找特定类型的信息。具体而言,事件提取标识出特定类型的事件或事实(特定的“场景”)的实例(包括所涉及的实体),并填充已为该场景预定义的数据库。随着在线可用文档数量的增加,实体提取对于各种应用程序的重要性也日益提高,包括跟踪新闻专线来源的恐怖活动并从Web建立职位发布数据库,仅举几例。作为谓词-自变量关系,已被广泛用作提取模式以标识要从文本中提取的项目。为每种情况创建提取模式的成本一直是瓶颈,限制了信息提取系统在不同情况下的可移植性,尽管对半监督模式发现程序进行了一些研究以降低这种成本。面临的挑战是开发一种用于识别用户指定方案的提取模式的全自动方法。本论文提出了一种新颖的方法,用于无监督地发现从原始文本中提取事件的提取模式。首先,我们提供一个框架,该框架使用户可以针对他/她的查询使用自定义信息提取系统:查询驱动信息提取(QDIE)框架。 QDIE框架的输入是用户的查询:事件提取任务的一组关键字或叙述性描述。第二,我们评估提取模式模型的改进。通过考虑基于谓词参数模型的现有工作的缺点及其扩展,我们提出了一种基于依赖树的任意子树的新颖提取模式模型。第三,解决了跨语言的可移植性问题。作为QDIE框架的案例研究,我们实现了一个预CODIE系统,即一种需要最少的人工干预的跨语言按需信息提取系统,该系统将QDIE框架作为模式发现的组件。此外,我们通过比较基于翻译的实施方式来评估机器翻译在跨语言信息提取中的作用。

著录项

  • 作者

    Sudo, Kiyoshi.;

  • 作者单位

    New York University.;

  • 授予单位 New York University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 105 p.
  • 总页数 105
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号