首页> 外文会议>European conference on advances in databases and information systems >Integrating Approximate String Matching with Phonetic String Similarity
【24h】

Integrating Approximate String Matching with Phonetic String Similarity

机译:将近似字符串匹配与语音字符串相似性集成

获取原文

摘要

Well-defined dictionaries of tagged entities are used in many tasks to identify entities where the scope is limited and there is no need to use machine learning. One common solution is to encode the input dictionary into Trie trees to find matches on an input text. However, the size of the dictionary and the presence of spelling errors on the input tokens have a negative influence on such solutions. We present an approach that transforms the dictionary and each input token into a compact well-known phonetic representation. The resulting dictionary is encoded in a Trie that is about 72% smaller than a non-phonetic Trie. We perform inexact matching over this representation to filter a set of initial results. Lastly, we apply a second similarity measure to filter the best result to annotate a given entity. The experiments showed that it achieved good Fl results. The solution was developed as an entity recognition plug-in for GATE, a well-known information extraction framework.
机译:在许多任务中使用定义良好的标记实体字典来识别范围受限且无需使用机器学习的实体。一种常见的解决方案是将输入字典编码为Trie树,以查找输入文本上的匹配项。但是,字典的大小和输入标记上的拼写错误的存在对这种解决方案具有负面影响。我们提出了一种将字典和每个输入令牌转换成紧凑的众所周知的语音表示形式的方法。生成的字典以Trie编码,该Trie比非语音Trie小约72%。我们对此表示进行不精确匹配,以过滤一组初始结果。最后,我们应用第二个相似度度量来过滤最佳结果,以对给定实体进行注释。实验表明,其获得了良好的F1结果。该解决方案是作为GATE的实体识别插件开发的,GATE是一种众所周知的信息提取框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号