首页> 外文会议>Conference on empirical methods in natural language processing >Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora
【24h】

Combining String and Context Similarity for Bilingual Term Alignment from Comparable Corpora

机译:结合字符串和上下文相似性,实现可比语料库的双语术语对齐

获取原文

摘要

Automatically compiling bilingual dictionaries of technical terms from comparable corpora is a challenging problem, yet with many potential applications. In this paper, we exploit two independent observations about term translations: (a) terms are often formed by corresponding sub-lexical units across languages and (b) a term and its translation tend to appear in similar lexical context. Based on the first observation, we develop a new character n-gram compositional method, a logistic regression classifier, for learning a string similarity measure of term translations. According to the second observation, we use an existing context-based approach. For evaluation, we investigate the performance of compositional and context-based methods on: (a) similar and unrelated languages, (b) corpora of different degree of comparability and (c) the translation of frequent and rare terms. Finally, we combine the two translation clues, namely string and contextual similarity, in a linear model and we show substantial improvements over the two translation signals.
机译:从可比较的语料库自动编译技术术语的双语词典是一个具有挑战性的问题,但具有许多潜在的应用程序。在本文中,我们利用有关术语翻译的两个独立观察结果:(a)术语通常是由跨语言的相应亚词法单元形成的;(b)术语及其翻译倾向于出现在相似的词法语境中。基于第一个观察,我们开发了一种新的字符n-gram合成方法,一种逻辑回归分类器,用于学习术语翻译的字符串相似性度量。根据第二个观察,我们使用现有的基于上下文的方法。为了进行评估,我们调查了基于组合和基于上下文的方法在以下方面的性能:(a)相似和不相关的语言;(b)不同可比性的语料库;(c)常见和罕见术语的翻译。最后,我们在线性模型中结合了两个翻译线索,即字符串和上下文相似性,并且显示了对两个翻译信号的实质性改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号