【24h】

Lexicalized phonotactic word segmentation

机译:词汇化语音词分割

获取原文

摘要

This paper presents a new unsupervised algorithm (WordEnds) for inferring word boundaries from transcribed adult conversations. Phone ngrams before and after observed pauses are used to bootstrap a simple discriminative model of boundary marking. This fast algorithm delivers high performance even on morphologically complex words in English and Arabic, and promising results on accurate phonetic transcriptions with extensive pronunciation variation. Expanding training data beyond the traditional miniature datasets pushes performance numbers well above those previously reported. This suggests that WordEnds is a viable model of child language acquisition and might be useful in speech understanding.
机译:本文介绍了一种新的无监督算法(换句唱),用于从转录的成人对话中推断出字边界。观察到的暂停之前和之后的手机Ngrams用于引导一个简单的边界标记辨别模型。这种快速算法即使在英语和阿拉伯语中的形态复杂的单词上也提供高性能,并且有希望的结果对具有广泛发音变化的准确语音转录。扩展传统的微型数据集以外的培训数据将绩效数字推动到先前报告的那些。这表明德语是一个可行的儿童习得型号,并且在语音理解中可能有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号