首页> 外文会议>Language and Technology Conference >Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources
【24h】

Cross-Lingual Adaptation of Broadcast Transcription System to Polish Language Using Public Data Sources

机译:广播转录系统与公共数据源的波兰语言的交叉语言调整

获取原文

摘要

We present methods and procedures designed for cost-efficient adaptation of an existing speech recognition system to Polish. The system (originally built for Czech language) is adapted using common texts and speech recordings accessible from Polish web-pages. The most critical part, an acoustic model (AM) for Polish, is built in several steps, which include: (a) an initial bootstrapping phase that utilizes existing Czech AM, (b) a lightly-supervised iterative scheme for automatic collection and annotation of Polish speech data, and finally (c) acquisition of a large amount of broadcast data in an unsupervised way. The developed system has been evaluated in the task of automatic content monitoring of major Polish TV and Radio stations. Its transcription accuracy (measured on a set of 4 complete TV news shows with total duration of 105 min) is 79,2%. For clean studio speech, its accuracy gets over 92%.
机译:我们提出了用于对现有语音识别系统的成本效益适应来说的方法和程序。系统(最初构建的捷克语)是使用从波兰网页访问的公共文本和语音录制来调整。最关键的部分是用于抛光的声学模型(AM),内置了几个步骤,其中包括:(a)利用现有捷克AM的初始引导阶段,(b)用于自动收集和注释的轻型监督迭代方案波兰语音数据,最后(c)以无人监督的方式获取大量广播数据。开发系统已在主要波兰电视和广播电台的自动内容监测任务中进行了评估。其转录准确性(在一套4个完整的电视节目中测量,总持续时间为105分钟)是79,2%。为了清洁工作室演讲,其准确性超过92%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号