Scalable Ad-hoc Entity Extraction from Text Collections

机译：从文本集合中可扩展的临时实体提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc" entity extraction task where entities of interest are constrained to be from a list of entities that is specific to the task. In such scenarios, traditional entity extraction techniques that process all the documents for each ad-hoc entity extraction task can be significantly expensive. In this paper, we propose an efficient approach that leverages the inverted index on the documents to identify the subset of documents relevant to the task and processes only those documents. We demonstrate the efficiency of our techniques on real datasets.

机译：支持从大型文档集中提取实体对于实现各种重要的数据分析任务非常重要。在本文中，我们介绍了“临时”实体提取任务，其中将感兴趣的实体限制为来自特定于该任务的实体列表。在这种情况下，为每个临时实体提取任务处理所有文档的传统实体提取技术可能会非常昂贵。在本文中，我们提出了一种有效的方法，该方法利用文档上的倒排索引来识别与任务相关的文档子集，并仅处理那些文档。我们证明了我们的技术在真实数据集上的效率。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|944-956|共13页
会议地点
作者
Sanjay Agrawal; Kaushik Chakrabarti; Surajit Chaudhuri; Venkatesh Ganti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. 集合方法在月动力预报信息提取中的应用 [J] . 杨辉, 张道民, 纪立人大气科学进展（英文版） . 2001,第002期
2. Humans Optional? Automatic Large-Scale Test Collections for Entity, Passage, and Entity-Passage Retrieval [J] . Laura Dietz, Jeff Dalton Datenbank-Spektrum . 2020,第1期

机译：人类可选？用于实体，段落和实体通道检索的自动大规模测试收集
3. Analysis of Text Collections for the Purposes of Keyword Extraction Task [J] . Alexander Vanyushkin, Leonid Graschenko Journal of Information and Organizational Sciences . 2020,第1期

机译：关键字提取任务目的分析文本收集
4. Design considerations for a large-scale image-based text search engine in historical manuscript collections [J] . Lambert Schomaker Information Technology . 2016,第2期

机译：历史手稿集中大型基于图像的文本搜索引擎的设计注意事项
5. Scalable Ad-hoc Entity Extraction from Text Collections [C] . Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, International conference on very large data bases . 2008

机译：从文本集合中提取可扩展的ad-hoc实体提取
6. Information extraction to enable faceted search over large text document collections. [D] . Ahmed, Syed Toufeeq. 2010

机译：信息提取可对大型文本文档集进行多面搜索。
7. A neural joint model for entity and relation extraction from biomedical text [O] . Fei Li, Meishan Zhang, Guohong Fu, 2017

机译：从生物医学文本中提取实体和关系的神经关节模型
8. Scalable Ad-hoc Entity Extraction from Text Collections [O] . Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, 2009

机译：从文本集合中可扩展的临时实体提取
9. General Architecture for Text Engineering (GATE) Developer for Entity Extraction: Overview for SYNCOIN [R] . Vanni, M, Neiderer, A 2014

机译：用于实体提取的文本工程通用架构（GaTE）开发人员：sYNCOIN概述

Scalable Ad-hoc Entity Extraction from Text Collections

摘要

著录项

相似文献

相关主题

期刊订阅