首页> 外文期刊>Machine translation >MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System
【24h】

MARS: A Statistical Semantic Parsing and Generation-Based Multilingual Automatic tRanslation System

机译:MARS:基于统计语义分析和生成的多语言自动翻译系统

获取原文
获取原文并翻译 | 示例
           

摘要

We present MARS (Multilingual Automatic tRanslation System), a research prototype speech-to-speech translation system. MARS is aimed at two-way conversational spoken language translation between English and Mandarin Chinese for limited domains, such as air travel reservations. In MARS, machine translation is embedded within a complex speech processing task, and the translation performance is highly effected by the performance of other components, such as the recognizer and semantic parser, etc. All components in the proposed system are statistically trained using an appropriate training corpus. The speech signal is first recognized by an automatic speech recognizer (ASR). Next, the ASR-transcribed text is analyzed by a semantic parser, which uses a statistical decision-tree model that does not require hand-crafted grammars or rules. Furthermore, the parser provides semantic information that helps further re-scoring of the speech recognition hypotheses. The semantic content extracted by the parser is formatted into a language-independent tree structure, which is used for an interlingua based translation. A Maximum Entropy based sentence-level natural language generation (NLG) approach is used to generate sentences in the target language from the semantic tree representations. Finally, the generated target sentence is synthesized into speech by a speech synthesizer. Many new features and innovations have been incorporated into MARS: the translation is based on understanding the meaning of the sentence; the semantic parser uses a statistical model and is trained from a semantically annotated corpus; the output of the semantic parser is used to select a more specific language model to refine the speech recognition performance; the NLG component uses a statistical model and is also trained from the same annotated corpus. These features give MARS the advantages of robustness to speech disfluencies and recognition errors, tighter integration of semantic information into speech recognition, and portability to new languages and domains. These advantages are verified by our experimental results.
机译:我们提出了MARS(多语言自动翻译系统),这是研究语音到语音翻译系统的原型。 MARS的目标是在有限的领域(例如航空旅行预订)中,英语和普通话之间的双向对话式口语翻译。在MARS中,机器翻译被嵌入到复杂的语音处理任务中,并且翻译性能在很大程度上受其他组件(例如识别器和语义解析器等)的性能影响。所提出系统中的所有组件均使用适当的方法进行统计训练训练语料库。语音信号首先由自动语音识别器(ASR)识别。接下来,由ASR解析的文本由语义解析器进行分析,该语义解析器使用不需要手工语法或规则的统计决策树模型。此外,解析器提供语义信息,有助于进一步对语音识别假设进行评分。解析器提取的语义内容被格式化为与语言无关的树结构,该结构用于基于语言的翻译。基于最大熵的句子级自然语言生成(NLG)方法用于从语义树表示中以目标语言生成句子。最后,通过语音合成器将生成的目标句子合成为语音。 MARS融入了许多新功能和创新:翻译基于对句子含义的理解;语义解析器使用统计模型,并从语义注释的语料库中进行训练;语义解析器的输出用于选择更具体的语言模型以改善语音识别性能。 NLG组件使用统计模型,并且也从相同的带注释语料库进行训练。这些功能为MARS提供了以下优势:对语音干扰和识别错误具有鲁棒性,语义信息与语音识别的紧密集成以及对新语言和新领域的可移植性。我们的实验结果证明了这些优势。

著录项

  • 来源
    《Machine translation》 |2002年第3期|p. 185-212|共28页
  • 作者单位

    IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

    IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

    IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

    IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

    IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

  • 收录信息 美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号