首页> 外文期刊>Computer speech and language >Unsupervised data processing for classifier-based speech translator
【24h】

Unsupervised data processing for classifier-based speech translator

机译:基于分类器的语音翻译器的无监督数据处理

获取原文
获取原文并翻译 | 示例

摘要

Concept classification has been used as a translation method and has shown notable benefits within the suite of speech-to-speech translation applications. However, the main bottleneck in achieving an acceptable performance with such classifiers is the cumbersome task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation needs a distance measure to compare sentences based on the concept they convey. Here, we introduce a new method of sentence comparison that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based statistical machine translation system are used to compare the concepts of the source sentences. Three clustering methods are adapted to support the concept-base distance. These methods are applied to prepare groups of paraphrases and use them as training sets in concept classification tasks. The statistical machine translation is also used to enhance the training data for the classifier which is crucial when such data are sparse. Experiments show the effectiveness of the proposed methods.
机译:概念分类已用作翻译方法,并且在语音到语音翻译应用程序套件中显示出显着的优势。但是,使用此类分类器获得可接受的性能的主要瓶颈是注释大量训练数据的繁琐任务。尝试开发一种方法来辅助或完全自动化数据注释都需要一种距离度量,以便根据它们传达的概念比较句子。在这里,我们从翻译的角度出发,介绍了一种新的句子比较方法。在这种方法中,基于短语的统计机器翻译系统产生的不完美翻译被用来比较源句子的概念。三种聚类方法适用于支持基于概念的距离。这些方法适用于准备复述组,并将它们用作概念分类任务中的训练集。统计机器翻译还用于增强分类器的训练数据,这在稀疏数据时至关重要。实验证明了所提方法的有效性。

著录项

  • 来源
    《Computer speech and language》 |2013年第2期|438-454|共17页
  • 作者单位

    Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, Los Angeles, CA 90089, USA;

    Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, Los Angeles, CA 90089, USA;

    Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, Los Angeles, CA 90089, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    speech to speech translation; spoken language understanding; concept classification;

    机译:语音到语音翻译;口语理解能力;概念分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号