首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Optimal Transport-based Alignment of Learned Character Representations for String Similarity
【24h】

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

机译:字符串相似度的基于学习词的最佳运输对齐方式

获取原文

摘要

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE'S ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE'S ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B~3 F1 over the previous state-of-the-art approach.
机译:字符串相似性模型对于记录链接,实体解析和搜索至关重要。在这项工作中,我们提出了STANCE-一种用于计算两个字符串相似度的学习模型。我们的方法对每个字符串的字符进行编码,使用Sinkhorn迭代(将对齐方式作为最佳传输的一个实例)对齐编码,并使用卷积神经网络对对齐方式进行评分。我们评估STANCE检测两个字符串是否可以引用同一个实体的能力-我们称其为别名检测。我们构建了五个新的别名检测数据集(并使它们公开可用)。我们显示,在五个数据集中的四个数据集上,STANCE(或其变体之一)的表现均优于最新和经典,无参数的相似度模型。我们还展示了STANCE通过将其应用于跨文档共引用实例来改善下游任务的能力,并表明与先前的最新方法相比,它可将B〜3 F1提升2.8点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号