首页> 外文会议>International Conference on Spoken Language Translation >Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020
【24h】

Generating Fluent Translations from Disfluent Text Without Access to Fluent References: IIT Bombay@IWSLT2020

机译:在不访问流利引用的情况下,从流利的文本生成流利的翻译:IIT Bombay @ IWSLT2020

获取原文

摘要

Machine translation systems perform reasonably well when the input is well-formed speech or text. Conversational speech is spontaneous and inherently consists of many dis-fluencies. Producing fluent translations of disfluent source text would typically require parallel disfluent to fluent training data. However, fluent translations of spontaneous speech are an additional resource that is tedious to obtain. This work describes the submission of IIT Bombay to the Conversational Speech Translation challenge at IWSLT 2020. We specifically tackle the problem of disfluency removal in disfluent-to-fluent text-to-text translation assuming no access to fluent references during training. Common patterns of disfluency are extracted from disfluent references and a noise induction model is used to simulate them starting from a clean monolingual corpus. This synthetically constructed dataset is then considered as a proxy for labeled data during training. We also make use of additional fluent text in the target language to help generate fluent translations. This work uses no fluent references during training and beats a baseline model by a margin of 4.21 and 3.11 BLEU points where the baseline uses disfluent and fluent references, respectively.
机译:当输入是格式正确的语音或文本时,机器翻译系统的性能会相当好。会话语音是自发的,并且固有地包含许多不满。产生流利的源文本的流利翻译通常需要将流利的并行数据与流利的培训数据并行。但是,自发语音的流畅翻译是获得乏味的额外资源。这项工作描述了IIT孟买向IWSLT 2020的对话语音翻译挑战提交的内容。我们假设在培训期间无法使用流利的参考文献的情况下,我们专门解决流利的流利文本到文本翻译中的流利去除问题。从流利的参考文献中提取出流离失所的常见模式,并使用噪声感应模型从干净的单语语料库开始模拟它们。然后,在培训期间,可以将此合成构造的数据集视为标记数据的代理。我们还使用目标语言中的其他流利文本来帮助生成流利的翻译。这项工作在培训期间不使用流利的参考,并且以4.21和3.11 BLEU点的幅度击败了基线模型,其中基线分别使用了流利的和流利的参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号