BLEU Deconstructed: Designing a Better MT Evaluation Metric

Xingyi Song; Trevor Cohn; Lucia Specia

首页> 外文期刊>International journal of computational linguistics and applications >BLEU Deconstructed: Designing a Better MT Evaluation Metric

【24h】

BLEU Deconstructed: Designing a Better MT Evaluation Metric

机译：解构的BLEU：设计更好的MT评估指标

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

BLEU is the de facto standard automatic evaluation metric in machine translation. While BLEU is undeniably useful, it has a number of limitations. Although it works well for large documents and multiple references, it is unreliable at the sentence or sub-sentence levels, and with a single reference. In this paper, we propose new variants of BLEU which address these limitations, resulting in a more flexible metric which is not only more reliable, but also allows for more accurate discriminative training. Our best metric has better correlation with human judgements than standard BLEU, despite using a simpler formulation. Moreover, these improvements carry over to a system tuned for our new metric.

机译：BLEU是机器翻译中事实上的标准自动评估指标。尽管BLEU无疑是有用的，但它有许多局限性。尽管它适用于大型文档和多个参考文献，但在句子或子句级别以及只有一个参考文献时并不可靠。在本文中，我们提出了BLEU的新变体，这些变体解决了这些局限性，从而产生了一种更加灵活的指标，该指标不仅更可靠，而且可以进行更准确的判别训练。尽管使用了更简单的公式，但与标准BLEU相比，我们最好的指标与人类的判断具有更好的相关性。而且，这些改进可以延续到针对我们的新指标进行调整的系统中。

著录项

来源
《International journal of computational linguistics and applications》 |2013年第2期|29-44|共16页
作者
Xingyi Song; Trevor Cohn; Lucia Specia;
展开▼
作者单位

Department of Computer Science, University of Sheffield, Sheffield, S1 4DP, UK;

Department of Computer Science,University of Sheffield,Sheffield, S1 4DP, UK;

Department of Computer Science, University of Sheffield, Sheffield, S1 4DP, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Metrics for MT evaluation: evaluating reordering [J] . Alexandra Birch, Miles Osborne, Phil Blunsom Machine translation . 2010,第1期

机译：MT评估指标：评估重新排序
2. Taking MT Evaluation Metrics to Extremes: Beyond Correlation with Human Judgments [J] . Marina Fomicheva, Lucia Specia Computational linguistics . 2019,第3期

机译：将MT评估指标发挥到极致：超越与人类判断的关联
3. Towards the use of entropy as a measure for the reliability of automatic MT evaluation metrics [J] . Munk Michal, Munkova Dasa, Benko Lubomir Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2018,第5期

机译：在使用熵作为自动MT评估度量可靠性的措施
4. Extending the BLEU MT Evaluation Method with Frequency Weightings [C] . Bogdan Babych, Anthony Hartley Association for Computational Linguistics Annual Meeting(ACL-04); 20040721-26; Barcelona(ES) . 2004

机译：通过频率加权扩展BLEU MT评估方法
5. Designing metrics for the small manufacturing enterprise office based on process performance metrics methodology (PPMM). [D] . Srisukpongsak, Jutamas. 2003

机译：基于流程绩效指标方法论（PPMM）为小型制造企业办公室设计指标。
6. Designing and Disseminating Metrics to Support Jurisdictional Efforts to End the Public Health Threat Posed by HIV Epidemics [O] . Denis Nash 2020

机译：设计和传播度量标准以支持司法机构的努力以制止艾滋病毒流行带来的公共卫生威胁
7. Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics [O] . Nitika Mathur, Timothy Baldwin, Trevor Cohn 2020

机译：在Bleu纠结：重新评估自动机器翻译评估度量的评估

BLEU Deconstructed: Designing a Better MT Evaluation Metric

摘要

著录项

相似文献

相关主题

期刊订阅