首页> 外文期刊>Computer speech and language >A linguistically motivated approach to grapheme-to-phoneme conversion for Korean
【24h】

A linguistically motivated approach to grapheme-to-phoneme conversion for Korean

机译:语言动机的朝鲜语音素到音素转换方法

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a hand-written rule-based grapheme-to-phoneme (GTP) conversion system for Korean built within the Festival text-to-speech (TTS) synthesis framework. The core of the GTP conversion system is a simple implementation of nine linguistically motivated morphophonological rules. These rules, which are well known to students of Korean linguistics, were implemented in Festival rewrite formalism, and were applied to 1.3 million distinct orthographic words (space-delimited eojeols) from the Korean Newswire corpus. The outputs were evaluated against a representative subset of eojeols. The subset was examined by three native speakers of Korean, who judged 91.17% of the word types in a stratified sample of Korean eojeols to be acceptable pronunciations, which means that our system converted 99.63% of the grapheme tokens correctly. This performance is comparable to that obtained from earlier studies such as Kim et al. [Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information. ACM Transactions on Asian Language Information Processing 1 (1) (2002) 65–82] which, contrary to our system, used an elaborate morphological analysis module. This is evidence of the potential benefit of well-abstracted linguistic knowledge. In addition, because our approach is based on well-known linguistic principles, error analysis is fairly straightforward. Straightforward error analysis is an essential step in knowing what features are likely to be informative in training a hybrid system where exceptions to rules are handled by a machine-learning component.
机译:本文介绍了在节日文字转语音(TTS)综合框架内构建的朝鲜语手写体基于音素到音素(GTP)的转换系统。 GTP转换系统的核心是九种语言动机的语素规则的简单实现。这些规则是韩国语言学学生所熟知的,已在Festival改写形式主义中得以实施,并被应用于Korea Newswire语料库中的130万个不同的正交字词(以空格分隔)。针对eojeols的代表性子集评估了输出。该子集由三位以韩语为母语的母语人士进行了检查,他们认为分层韩语eojeols样本中的91.17%的单词类型为可接受的发音,这意味着我们的系统正确转换了99.63%的字素标记。该性能可与早期研究(例如Kim等)获得的性能相媲美。 [基于音素的音素到音素的转换,使用语音模式和音素连通性信息。 ACM Transactions on Asian Language Information Processing 1(1)(2002)65-82],与我们的系统相反,它使用了精心设计的形态分析模块。这证明了很好的语言知识的潜在益处。此外,由于我们的方法基于众所周知的语言原理,因此错误分析相当简单。简单明了的错误分析是必不可少的步骤,它是了解在训练混合系统时哪些功能可能有用的信息,在混合系统中,规则的异常由机器学习组件处理。

著录项

  • 来源
    《Computer speech and language》 |2006年第4期|p. 357-381|共25页
  • 作者

    Kyuchul Yoon; Chris Brew;

  • 作者单位

    Department of Linguistics, The Ohio State University, Columbus, OH, USA;

    Department of Linguistics, The Ohio State University, Columbus, OH, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号