首页> 外文会议>ACM conference on information and knowledge management >Extending Dictionary-based Entity Extraction to Tolerate Errors
【24h】

Extending Dictionary-based Entity Extraction to Tolerate Errors

机译:扩展基于词典的实体提取以容忍错误

获取原文

摘要

Entity extraction (also known as entity recognition) extracts entities (e.g., person names, locations, companies) from text. Approximate (dictionary-based) entity extraction is a recent trend to improve extraction quality, which extracts substrings in text that approximately match predefined entities in a given dictionary. In this paper, we study the problem of approximate entity extraction with edit-distance constraints. A straightforward method first extracts all substrings from the text and then for each substring identifies its similar entities from the dictionary using existing methods for approximate string search. However many substrings of the text have overlaps, and we have an opportunity to utilize the shared computation across the overlaps to avoid unnecessary duplicate computations. To this end, we propose a heap-based framework to efficiently extract entities. We have implemented our techniques, and the experimental results show that our method achieves high performance and outperforms existing studies significantly.
机译:实体提取(也称为实体识别)从文本中提取实体(例如,人物,地点,公司)。近似(基于字典的)实体提取是最近提高提取质量的趋势,其在文本中提取了大致匹配给定字典中的预定义实体的子字符串。在本文中,我们研究了编辑距离约束的近似实体提取问题。直接方法首先从文本中提取所有子字符串,然后针对每个子字符串识别使用用于近似串搜索的现有方法从字典中识别其相似的实体。然而,文本的许多子字序具有重叠,并且我们有机会利用跨越重叠的共享计算以避免不必要的重复计算。为此,我们提出了一种基于堆的框架来有效地提取实体。我们已经实施了我们的技术,实验结果表明,我们的方法显着实现了高性能,优于现有的研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号