Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data

George Tambouratzis

首页> 外文期刊>Pattern recognition letters >Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data

【24h】

Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data

机译：涉及稀疏训练数据的MT短语任务中的条件随机字段与模板匹配

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This communication focuses on comparing the template-matching technique to established probabilistic approaches - such as conditional random fields (CRF) - on a specific linguistic task, namely the phrasing of a sequence of words into phrases. This task represents a low-level parsing of the sequence into linguistically-motivated phrases. CRF represents the established method for implementing such a data-driven parser, while template-matching is a simpler method that is faster to train and operate. The two aforementioned techniques are compared here to determine the most suitable approach for extracting an accurate model. The specific application studied is related to a machine translation (MT) methodology (namely PRESEMT), though the comparison performed holds for other applications as well, for which only sparse training data are available. PRESEMT uses small parallel corpora to learn structural transformations from a source language (SL) to a target language (TL) and thus translate input text. This results in the availability of only sparse training data from which to train the parser. Experimental results indicate that for a limited-size training set, as is the case for the PRESEMT methodology, template-matching generates a superior phrasing model that in turn generates higher quality translations. This is confirmed by studying more than one source/target language pairs, for multiple independent testsets.

机译：此次交流的重点是将模板匹配技术与特定语言任务（即将单词序列短语表达为短语）中已建立的概率方法（例如条件随机字段（CRF））进行比较。此任务表示将序列低层解析为语言动机的短语。 CRF表示用于实现这种数据驱动的解析器的既定方法，而模板匹配是一种更简单的方法，可以更快地训练和操作。这里比较了上述两种技术，以确定提取准确模型的最合适方法。研究的特定应用程序与机器翻译（MT）方法（即PRESEMT）有关，尽管进行的比较也适用于其他应用程序，因为这些应用程序仅可获得稀疏的训练数据。 PRESEMT使用小型并行语料库来学习从源语言（SL）到目标语言（TL）的结构转换，从而翻译输入文本。这导致只有稀疏的训练数据可用于训练解析器。实验结果表明，对于有限大小的训练集（如PRESEMT方法一样），模板匹配会生成出色的短语模型，进而生成更高质量的翻译。通过研究多个独立测试集的多个源/目标语言对，可以证实这一点。

著录项

来源
《Pattern recognition letters》 |2015年第1期|44-52|共9页
作者
George Tambouratzis;
展开▼
作者单位

Institute for Language and Speech Processing, Athena R.C., 6 Artemidos & Epidavrou Str., Paradissos Amaroussiou, 15125 Athens, Greece;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Parsing of natural language; Template-matching; Conditional-random fields; Phrasing model generator; Machine translation;

机译：解析自然语言;模板匹配;条件随机字段;短语模型生成器;机器翻译;

相似文献

外文文献
中文文献
专利

1. Dual-task training with progression from variable- to fixed-priority instructions versus dual-task training with variable-priority on gait speed in community-dwelling older adults: A protocol for a randomized controlled trial [J] . Francis Trombini-Souza, Marcelo de Maio Nascimento, Tarcísio Fulgêncio Alves da Silva, BMC Geriatrics . 2020,第1期

机译：双重任务培训，从可变到固定优先级指令的进展与具有变量优先级的双任务培训，在社区住宅老年人中的步态速度：随机对照试验的协议
2. High-Performance Training of Conditional Random Fields for Large-Scale Applications of Labeling Sequence Data [J] . Xuan-Hieu PHAN, Le-Minh NGUYEN, Yasushi INOGUCHI, IEICE Transactions on Information and Systems . 2007,第1期

机译：大规模应用标签序列数据的条件随机字段的高性能训练
3. S3CRF: Sparse Spatial-Spectral Conditional Random Field Target Detection Framework for Airborne Hyperspectral Data [J] . Wang Shaoyu, Zhong Yanfei, Zhao Ji, Quality Control, Transactions . 2020,第期

机译：S3CRF：空气传播高光谱数据稀疏空间光谱条件随机场目标检测框架
4. Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system [C] . George Tambouratzis 3rd Workshop on hybrid approaches to translation . 2014

机译：混合MT系统中短语任务中CRF和模板匹配的比较
5. Efficient training methods for conditional random fields. [D] . Sutton, Charles A. 2008

机译：用于条件随机场的有效训练方法。
6. Sparse reconstruction of compressive sensing MRI using cross-domain stochastically fully connected conditional random fields [O] . Edward Li, Farzad Khalvati, Mohammad Javad Shafiee, 2016

机译：使用跨域随机全连接条件随机场的压缩感知MRI稀疏重建
7. Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system [O] . George Tambouratzis, Ilsp/athena Res Centre, Paradissos Amaroussiou 2015

机译：比较混合mT系统中的短语任务中的CRF和模板匹配
8. Sparse Forward-Backward for Fast Training of Conditional Random Fields [R] . Sutton, C. , Pal, C. , McCallum, A. 2006

机译：用于条件随机场快速训练的稀疏前向后向

Conditional random fields versus template-matching in MT phrasing tasks involving sparse training data

摘要

著录项

相似文献

相关主题

期刊订阅