首页> 美国卫生研究院文献>other >A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations
【2h】

A Cross-Lingual Similarity Measure for Detecting Biomedical Term Translations

机译:用于检测生物医学术语翻译的跨语言相似性度量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bilingual dictionaries for technical terms such as biomedical terms are an important resource for machine translation systems as well as for humans who would like to understand a concept described in a foreign language. Often a biomedical term is first proposed in English and later it is manually translated to other languages. Despite the fact that there are large monolingual lexicons of biomedical terms, only a fraction of those term lexicons are translated to other languages. Manually compiling large-scale bilingual dictionaries for technical domains is a challenging task because it is difficult to find a sufficiently large number of bilingual experts. We propose a cross-lingual similarity measure for detecting most similar translation candidates for a biomedical term specified in one language (source) from another language (target). Specifically, a biomedical term in a language is represented using two types of features: (a) intrinsic features that consist of character n-grams extracted from the term under consideration, and (b) extrinsic features that consist of unigrams and bigrams extracted from the contextual windows surrounding the term under consideration. We propose a cross-lingual similarity measure using each of those feature types. First, to reduce the dimensionality of the feature space in each language, we propose prototype vector projection (PVP)—a non-negative lower-dimensional vector projection method. Second, we propose a method to learn a mapping between the feature spaces in the source and target language using partial least squares regression (PLSR). The proposed method requires only a small number of training instances to learn a cross-lingual similarity measure. The proposed PVP method outperforms popular dimensionality reduction methods such as the singular value decomposition (SVD) and non-negative matrix factorization (NMF) in a nearest neighbor prediction task. Moreover, our experimental results covering several language pairs such as English–French, English–Spanish, English–Greek, and English–Japanese show that the proposed method outperforms several other feature projection methods in biomedical term translation prediction tasks.
机译:技术术语(例如生物医学术语)的双语词典对于机器翻译系统以及希望理解用外语描述的概念的人而言,都是重要的资源。通常,生物医学术语通常首先用英语提出,然后将其手动翻译为其他语言。尽管存在大量的生物医学术语的单语词典,但这些术语词典中只有一小部分被翻译成其他语言。手动为技术领域编译大型双语词典是一项艰巨的任务,因为很难找到足够多的双语专家。我们提出了一种跨语言相似性度量,用于检测从一种语言(源)到另一种语言(目标)指定的生物医学术语的最相似翻译候选。具体而言,一种语言中的生物医学术语使用两种类型的特征表示:(a)由从所考虑的术语中提取的字符n-gram组成的内在特征,以及(b)由从词组中提取的单字组和二元词组成的外部特征。围绕该术语的上下文窗口。我们提出了使用每种功能类型的跨语言相似性度量。首先,为了减少每种语言中特征空间的维数,我们提出了原型矢量投影(PVP)-一种非负的低维矢量投影方法。其次,我们提出了一种使用偏最小二乘回归(PLSR)学习源语言和目标语言中的特征空间之间的映射的方法。所提出的方法仅需要少量训练实例就可以学习跨语言的相似性度量。提出的PVP方法在最近邻预测任务中胜过流行的降维方法,例如奇异值分解(SVD)和非负矩阵分解(NMF)。此外,我们的实验结果涵盖了几种语言对,例如英语-法语,英语-西班牙语,英语-希腊语和英语-日语,表明该方法在生物医学术语翻译预测任务中胜过其他几种特征投影方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号