Extending Dictionary-based Entity Extraction to Tolerate Errors

机译：扩展基于词典的实体提取以容忍错误

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Entity extraction (also known as entity recognition) extracts entities (e.g., person names, locations, companies) from text. Approximate (dictionary-based) entity extraction is a recent trend to improve extraction quality, which extracts substrings in text that approximately match predefined entities in a given dictionary. In this paper, we study the problem of approximate entity extraction with edit-distance constraints. A straightforward method first extracts all substrings from the text and then for each substring identifies its similar entities from the dictionary using existing methods for approximate string search. However many substrings of the text have overlaps, and we have an opportunity to utilize the shared computation across the overlaps to avoid unnecessary duplicate computations. To this end, we propose a heap-based framework to efficiently extract entities. We have implemented our techniques, and the experimental results show that our method achieves high performance and outperforms existing studies significantly.

机译：实体提取（也称为实体识别）从文本中提取实体（例如，人物，地点，公司）。近似（基于字典的）实体提取是最近提高提取质量的趋势，其在文本中提取了大致匹配给定字典中的预定义实体的子字符串。在本文中，我们研究了编辑距离约束的近似实体提取问题。直接方法首先从文本中提取所有子字符串，然后针对每个子字符串识别使用用于近似串搜索的现有方法从字典中识别其相似的实体。然而，文本的许多子字序具有重叠，并且我们有机会利用跨越重叠的共享计算以避免不必要的重复计算。为此，我们提出了一种基于堆的框架来有效地提取实体。我们已经实施了我们的技术，实验结果表明，我们的方法显着实现了高性能，优于现有的研究。

著录项

来源
《ACM conference on information and knowledge management》|2010年||共4页
会议地点
作者
Guoliang Li; Dong Deng; Jianhua Feng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
approximate entity extraction; edit distance; heap;

机译：近似实体提取;编辑距离;堆;

相似文献

外文文献
中文文献
专利

1. Boosting approximate dictionary-based entity extraction with synonyms [J] . Information Sciences: An International Journal . 2020,第期

机译：通过同义词提升近似的基于词典的实体提取
2. A unified framework for approximate dictionary-based entity extraction [J] . Dong Deng, Guoliang Li, Jianhua Feng, The VLDB journal . 2015,第1期

机译：基于近似字典的实体提取的统一框架
3. Key geometric error extraction of machine tool based on extended Fourier amplitude sensitivity test method [J] . Cheng Qiang, Sun Bingwei, Liu Zhifeng, The International Journal of Advanced Manufacturing Technology . 2017,第9a12期

机译：基于扩展傅里叶幅度灵敏度测试方法的机床关键几何误差提取
4. Extending Dictionary-based Entity Extraction to Tolerate Errors [C] . Guoliang Li, Dong Deng, Jianhua Feng CIKM 10;ACM conference on information and knowledge management . 2011

机译：扩展基于字典的实体提取以容忍错误
5. Learning for information extraction: From named entity recognition and disambiguation to relation extraction. [D] . Bunescu, Razvan Constantin. 2007

机译：学习信息提取：从命名实体识别和歧义消除到关系提取。
6. Chemical entity recognition in patents by combining dictionary-based and statistical approaches [O] . Saber A. Akhondi, Ewoud Pons, Zubair Afzal, 2016

机译：通过结合基于字典的方法和统计方法在专利中识别化学实体
7. Extending autocompletion to tolerate errors [O] . Surajit Chaudhuri, Raghav Kaushik 2009

机译：扩展自动补全以容忍错误

Extending Dictionary-based Entity Extraction to Tolerate Errors

摘要

著录项

相似文献

相关主题

期刊订阅