首页> 外文期刊>Pattern recognition letters >Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data
【24h】

Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data

机译:涉及稀疏训练数据的MT短语任务中的条件随机字段与模板匹配

获取原文
获取原文并翻译 | 示例
       

摘要

This communication focuses on comparing the template-matching technique to established probabilistic approaches - such as conditional random fields (CRF) - on a specific linguistic task, namely the phrasing of a sequence of words into phrases. This task represents a low-level parsing of the sequence into linguistically-motivated phrases. CRF represents the established method for implementing such a data-driven parser, while template-matching is a simpler method that is faster to train and operate. The two aforementioned techniques are compared here to determine the most suitable approach for extracting an accurate model. The specific application studied is related to a machine translation (MT) methodology (namely PRESEMT), though the comparison performed holds for other applications as well, for which only sparse training data are available. PRESEMT uses small parallel corpora to learn structural transformations from a source language (SL) to a target language (TL) and thus translate input text. This results in the availability of only sparse training data from which to train the parser. Experimental results indicate that for a limited-size training set, as is the case for the PRESEMT methodology, template-matching generates a superior phrasing model that in turn generates higher quality translations. This is confirmed by studying more than one source/target language pairs, for multiple independent testsets.
机译:此次交流的重点是将模板匹配技术与特定语言任务(即将单词序列短语表达为短语)中已建立的概率方法(例如条件随机字段(CRF))进行比较。此任务表示将序列低层解析为语言动机的短语。 CRF表示用于实现这种数据驱动的解析器的既定方法,而模板匹配是一种更简单的方法,可以更快地训练和操作。这里比较了上述两种技术,以确定提取准确模型的最合适方法。研究的特定应用程序与机器翻译(MT)方法(即PRESEMT)有关,尽管进行的比较也适用于其他应用程序,因为这些应用程序仅可获得稀疏的训练数据。 PRESEMT使用小型并行语料库来学习从源语言(SL)到目标语言(TL)的结构转换,从而翻译输入文本。这导致只有稀疏的训练数据可用于训练解析器。实验结果表明,对于有限大小的训练集(如PRESEMT方法一样),模板匹配会生成出色的短语模型,进而生成更高质量的翻译。通过研究多个独立测试集的多个源/目标语言对,可以证实这一点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号