...
首页> 外文期刊>ACM transactions on Asian language information processing >A Phonetic Similarity Model for Automatic Extraction of Transliteration Pairs
【24h】

A Phonetic Similarity Model for Automatic Extraction of Transliteration Pairs

机译:自动提取音译对的语音相似度模型

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

This article proposes an approach for the automatic extraction of transliteration pairs from Chinese Web corpora. In this approach, we formulate the machine transliteration process using a syllable-based phonetic similarity model which consists of phonetic confusion matrices and a Chinese character n-gram language model. With the phonetic similarity model, the extraction of transliteration pairs becomes a two-step process of recognition followed by validation: First, in the recognition process, we identify the most probable transliteration in the k-neighborhood of a recognized English word. Then, in the validation process, we qualify the transliteration pair candidates with a hypothesis test. We carry out an analytical study on the statistics of several key factors in English-Chinese transliteration to help formulate phonetic similarity modeling. We then conduct both supervised and unsupervised learning of a phonetic similarity model on a development database. The experimental results validate the effectiveness of the phonetic similarity model by achieving an F-measure of 0.739 in supervised learning. The unsupervised learning approach works almost as well as the supervised one, thus allowing us to deploy automatic extraction of transliteration pairs in the Web space.
机译:本文提出了一种从中文Web语料库中自动提取音译对的方法。在这种方法中,我们使用基于音节的语音相似性模型(由语音混淆矩阵和汉字n-gram语言模型组成)来制定机器音译过程。使用语音相似性模型,音译对的提取成为识别的两步过程,然后进行验证:首先,在识别过程中,我们识别出已识别英语单词的k邻域中最可能的音译。然后,在验证过程中,我们通过假设检验对音译对候选词进行资格鉴定。我们对英汉音译中几个关键因素的统计数据进行了分析研究,以帮助建立语音相似性建模。然后,我们在开发数据库上进行语音相似性模型的有监督和无监督学习。实验结果通过在监督学习中实现0.739的F测度验证了语音相似性模型的有效性。无监督的学习方法几乎与有监督的学习方法一样好,因此允许我们在Web空间中部署自动提取音译对。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号