首页> 外文期刊>Computer speech and language >Automatic categorization for improving Spanish into Spanish Sign Language machine translation
【24h】

Automatic categorization for improving Spanish into Spanish Sign Language machine translation

机译:自动分类以将西班牙语改进为西班牙语手语机器翻译

获取原文
获取原文并翻译 | 示例
           

摘要

This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
机译:本文介绍了一种预处理模块,用于在处理稀疏的训练数据时提高西班牙语到西班牙语手语(Lengua de Signos Espanola:LSE)翻译系统的性能。该预处理模块将西班牙语单词替换为关联的标签。该模块使用的带有西班牙语单词(词汇)和相关标签的列表会自动考虑那些显示出每个西班牙语单词翻译可能性最高的符号来自动计算。这种自动标签提取已与实现几乎相同改进的手动策略进行了比较。在此分析中,研究了处理不相关单词的几种替代方法。不相关的单词是未分配给任何符号的西班牙语单词。预处理模块已合并到两个著名的统计翻译体系结构中:基于短语的系统和统计有限状态换能器(SFST)。该系统是针对特定应用领域开发的:身份证明文件和驾驶执照的更新。为了评估系统,使用了由4080个西班牙语句子及其LSE翻译组成的平行语料库。评估结果显示,包括此预处理模块后,性能将得到显着改善。在基于短语的系统中,提出的模块使BLEU(双语评估学习)从73.8%增加到81.0%,而人类评价得分从0.64增加到0.83。对于SFST,BLEU从70.6%提高到78.4%,人类评估得分从0.65提高到0.82。

著录项

  • 来源
    《Computer speech and language》 |2012年第3期|p.149-167|共19页
  • 作者单位

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

    Grupo de Tecnologia del Habla, Departamento de Ingenieria Electrdnica, ETSI Telecomunicacit'm, Universidad Politecnica de Madrid, Ciudad Universitaria s, 28040 Madrid. Spain;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    spanish sign language (LSE); statistical language translation; syntactic-semantic information: automatic tagging; automatic categorization;

    机译:西班牙手语(LSE);统计语言翻译;句法语义信息:自动标记;自动分类;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号