首页> 外文会议>Knowledge-Based Systems for Safety Critical Applications >Querying text databases for efficient information extraction
【24h】

Querying text databases for efficient information extraction

机译:查询文本数据库以进行有效的信息提取

获取原文
获取原文并翻译 | 示例

摘要

A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adapt to new databases and domains. We develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.
机译:大量信息隐藏在非结构化文本中。通常最好以结构化或关系形式来利用此信息,该信息适合于复杂的查询处理,与关系数据库的集成以及数据挖掘。当前的信息提取技术通过检查数据库中的每个文档来从文本数据库中提取关系,或者使用过滤器选择有希望的文档以进行提取。对于大型数据库,穷举扫描方法不切实际甚至不可行,并且当前的过滤技术需要人为维护和适应新的数据库和域。我们开发了一种基于自动查询的技术,该技术可检索可用于从大型文本数据库中提取用户定义的关系的文档,该文档可轻松适应新的域,数据库或目标关系。我们对大型报纸档案馆进行了全面的实验评估,结果表明,仅关注有前途的文件,我们就大大提高了提取过程的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号