Optimal Transport-based Alignment of Learned Character Representations for String Similarity

机译：字符串相似度的基于学习词的最佳运输对齐方式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

String similarity models are vital for record linkage, entity resolution, and search. In this work, we present STANCE-a learned model for computing the similarity of two strings. Our approach encodes the characters of each string, aligns the encodings using Sinkhorn Iteration (alignment is posed as an instance of optimal transport) and scores the alignment with a convolutional neural network. We evaluate STANCE'S ability to detect whether two strings can refer to the same entity-a task we term alias detection. We construct five new alias detection datasets (and make them publicly available). We show that STANCE (or one of its variants) outperforms both state-of-the-art and classic, parameter-free similarity models on four of the five datasets. We also demonstrate STANCE'S ability to improve downstream tasks by applying it to an instance of cross-document coreference and show that it leads to a 2.8 point improvement in B~3 F1 over the previous state-of-the-art approach.

机译：字符串相似性模型对于记录链接，实体解析和搜索至关重要。在这项工作中，我们提出了STANCE-一种用于计算两个字符串相似度的学习模型。我们的方法对每个字符串的字符进行编码，使用Sinkhorn迭代（将对齐方式作为最佳传输的一个实例）对齐编码，并使用卷积神经网络对对齐方式进行评分。我们评估STANCE检测两个字符串是否可以引用同一个实体的能力-我们称其为别名检测。我们构建了五个新的别名检测数据集（并使它们公开可用）。我们显示，在五个数据集中的四个数据集上，STANCE（或其变体之一）的表现均优于最新和经典，无参数的相似度模型。我们还展示了STANCE通过将其应用于跨文档共引用实例来改善下游任务的能力，并表明与先前的最新方法相比，它可将B〜3 F1提升2.8点。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|5907-5917|共11页
会议地点
作者
Derek Tam; Nicholas Monath; Ari Kobren; Aaron Traylor; Rajarshi Das; Andrew McCallum;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Comparative Evaluation of String Similarity Metrics for Ontology Alignment [J] . Yufei Sun, Liangli Ma, Shuang Wang Journal of information and computational science . 2015,第3期

机译：字符串相似度度量的本体对齐方式比较评估
2. Similarity retrieval of videos by using 3D C-string knowledge representation [J] . Anthony J.T. Lee, Han-Pang Chiu, Ping Yu Journal of visual communication & image representation . 2005,第6期

机译：使用3D C字符串知识表示的视频相似度检索
3. /spl Theta//spl Rfr/-string: A geometry-based representation for efficient and effective retrieval of images by spatial similarity [J] . Gudivada V.N. IEEE Transactions on Knowledge and Data Engineering . 1998,第3期

机译：/ spl Theta // spl Rfr / -string：一种基于几何的表示形式，可通过空间相似性高效地检索图像
4. Optimal Transport-based Alignment of Learned Character Representations for String Similarity [C] . Derek Tam, Nicholas Monath, Ari Kobren, Annual meeting of the Association for Computational Linguistics . 2019

机译：基于最佳传输的字符串相似性学习字符表示的对齐
5. Optimal Transport-Based Density-Aware Single and Multi-Agent Exploration Strategies for Efficient Environment Survey [D] . Kabir, Rabiul Hasan. 2021

机译：基于最佳的基于传输的密度感知单一和多功能探索策略，用于高效环境调查
6. Beyond similarity assessment: selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm [O] . Taikai Takeda, Michiaki Hamada -1

机译：超越相似性评估：通过因子分解渐近贝叶斯算法为序列比对选择最佳模型
7. Optimal Transport-based Alignment of Learned Character Representations for String Similarity [O] . Derek Tam, Nicholas Monath, Ari Kobren, 2019

机译：基于最佳传输的字符串相似性学习字符表示的对齐

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

摘要

著录项

相似文献

相关主题

期刊订阅