首页> 外文期刊>ACM transactions on Asian language information processing >Measuring Similarity between Transliterations against Noise Data
【24h】

Measuring Similarity between Transliterations against Noise Data

机译:测量音译与噪声数据之间的相似性

获取原文
获取原文并翻译 | 示例
           

摘要

When editors of newspapers and magazines translate proper nouns from foreign languages into Chinese, the Chinese translation (termed transliterations) they choose will typically be phonetically similar to the original word. With many different translators working without a common standard, there may be many different Chinese transliterations for the same proper noun, such as using the same sounds but different Chinese characters or even using different sounds and characters. This causes confusion for the reader and, more importantly, leads to incomplete Chinese Web search results. This article investigates the similarity comparison of transliterations as a first step toward solving the incomplete search problem. We devise a method based on comparing digitalized Chinese character (or Hanzi) sounds. Along with four other methods based on comparing grapheme or phoneme similarity, we compare their performance of identifying synonymous transliterations against noise words taken from Web pages. Experimental results indicate that our method surpasses the other methods due to its advantage of containing more discriminative information in sound vectors. The method performing the second best is based on a scheme which assigns similarity between phonemes by carefully considering articulatory features of phonemes, including using multivalued features and placing different weights on the features. Among six pinyin schemes used to romanize Chinese transliterations, the Tongyong scheme outperforms the others.
机译:当报纸和杂志的编辑将专有名词从外语翻译成中文时,他们选择的中文翻译(音译)通常会在发音上与原始单词相似。由于许多不同的译者在没有统一标准的情况下工作,同一专有名词可能会有许多不同的中文音译,例如使用相同的声音但使用不同的汉字,甚至使用不同的声音和字符。这会引起读者的困惑,更重要的是,会导致中文Web搜索结果不完整。本文研究音译的相似性比较,以此作为解决不完整搜索问题的第一步。我们设计了一种基于比较数字汉字(或汉字)声音的方法。与其他四种基于比较音素或音素相似性的方法一起,我们比较了它们在识别同义音译与从网页中提取的干扰词时的性能。实验结果表明,我们的方法由于在声音矢量中包含更多判别信息的优势而超越了其他方法。表现第二佳的方法基于一种方案,该方案通过仔细考虑音素的发音特征来分配音素之间的相似性,包括使用多值特征并将不同的权重赋予这些特征。在用于使中文音译浪漫化的六种拼音方案中,“同用”方案胜过其他方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号