首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates
【24h】

Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates

机译:Europarl-ST:议会辩论语音翻译的多语言语料库

获取原文

摘要

Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition, machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.
机译:由于缺乏可用的特定数据资源来完成当前对口语翻译(SLT)或语音到文本翻译的研究,因为当前可用的SLT数据集仅限于一组有限的语言对。在本文中,我们介绍了Europarl-ST,这是一种新颖的多语言SLT语料库,其中包含成对的SLT音频文本样本,这些样本来自6种欧洲语言,共30种不同的翻译方向。该语料库是根据2008年至2012年期间在欧洲议会举行的辩论而汇编的。本文介绍了语料库的创建过程,并提出了一系列自动语音识别,机器翻译和口语翻译实验,突出了该语料库的潜力。新资源。语料库是根据知识共享许可发布的,可以免费访问和下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号