首页> 外文会议>LREC-2012 >The KIT Lecture Corpus for Speech Translation
【24h】

The KIT Lecture Corpus for Speech Translation

机译:套件讲座语音翻译语料库

获取原文

摘要

Academic lectures offer valuable content, but often do not reach their full potential audience due to the language barrier. Human translations of lectures are too expensive to be widely used. Speech translation technology can be an affordable alternative in this case. State-of-the-art spoken language translation systems utilize statistical models that need to be trained on large amounts of in-domain data. In order to support the KIT lecture translation project in its effort to introduce speech translation technology in KIT's lecture halls, we have collected a corpus of German lectures at KIT. In this paper we describe how we recorded the lectures and how we annotated them. We further give detailed statistics on the types of lectures in the corpus and its size. We collected the corpus with the purpose in mind that it should not just be suited for training a spoken language translation system the traditional way, but should also allow us to research techniques that enable the translation system to automatically and autonomously adapt itself to the varying topics and speakers of the different lectures.
机译:学术讲座提供有价值的内容,但由于语言障碍,往往不会达到他们的全部潜在观众。人类的讲座翻译太昂贵,无法被广泛使用。在这种情况下,语音翻译技术可以是一个实惠的替代品。最先进的口语语言翻译系统利用需要在大量域内数据培训的统计模型。为了支持KIT讲座翻译项目,努力在套件的讲座大厅介绍语音翻译技术,我们在套件中收集了德国讲座的语料库。在本文中,我们描述了我们如何记录讲座以及我们如何向其注释。我们进一步详细介绍了语料库中的讲座类型及其规模的详细统计数据。我们认为具有目的的语料库,不应该适合传统的方式培训口语翻译系统,但也应该允许我们研究改变翻译系统自动和自主地适应不同主题的技术和讲话的不同讲座。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号