首页> 外文会议>Advances in information retrieval >Transliteration Equivalence Using Canonical Correlation Analysis
【24h】

Transliteration Equivalence Using Canonical Correlation Analysis

机译:使用典范相关分析的音译等价

获取原文
获取原文并翻译 | 示例

摘要

We address the problem of Transliteration Equivalence, I.e. determining whether a pair of words in two different languages (e.g. Auden, (SfTs-T)) are name transliterations or not. This problem is at the heart of Mining Name Transliterations (MINT) from various sources of multilingual text data including parallel, comparable, and non-comparable corpora and multilingual news streams. MINT is useful in several cross-language tasks including Cross-Language Information Retrieval (CLIR), Machine Translation (MT), and Cross-Language Named Entity Retrieval. We propose a novel approach to Transliteration Equivalence using language-neutral representations of names. The key idea is to consider name transliterations in two languages as two views of the same semantic object and compute a low-dimensional common feature space using Canonical Correlation Analysis (CCA). Similarity of the names in the common feature space forms the basis for classifying a pair of names as transliterations. We show that our approach outperforms state-of-the-art baselines in the CLIR task for Hindi-English (3 collections) and Tamil-English (2 collections).
机译:我们解决音译对等问题,即确定使用两种不同语言(例如Auden,(SfTs-T))的一对单词是否是姓名音译。这个问题是来自多种语言文本源的采矿名称音译(MINT)的核心,这些数据源包括并行,可比较和不可比的语料库和多语言新闻流。 MINT在一些跨语言任务中很有用,包括跨语言信息检索(CLIR),机器翻译(MT)和跨语言命名实体检索。我们提出了一种使用名称的语言中性表示形式进行音译等效的新方法。关键思想是将两种语言的名称音译视为同一语义对象的两个视图,并使用规范相关分析(CCA)计算低维公共特征空间。公共特征空间中名称的相似性构成将一对名称分类为音译的基础。我们显示,对于印度语(英语)(3个集合)和泰米尔语(英语)(2个集合),CLIR任务的方法优于最新的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号