On Some Pitfalls in Automatic Evaluation and Significance Testing for MT

机译：MT自动评估和意义测试中的一些陷阱

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate some pitfalls regarding thediscriminatory power of MT evaluationmetrics and the accuracy of statistical significancetests. In a discriminative rerankingexperiment for phrase-based SMT weshow that the NIST metric is more sensitivethan BLEU or F-score despite their incorporationof aspects of fluency or meaningadequacy into MT evaluation. In anexperimental comparison of two statisticalsignificance tests we show that p-valuesare estimated more conservatively by approximaterandomization than by bootstraptests, thus increasing the likelihoodof type-I error for the latter. We pointout a pitfall of randomly assessing significancein multiple pairwise comparisons,and conclude with a recommendation tocombine NIST with approximate randomization,at more stringent rejection levelsthan is currently standard.

机译：我们调查了有关 MT评估的歧视性指标和统计显着性的准确性测试。在有区别的重新排名中基于短语的SMT实验，我们表明NIST指标更加敏感尽管加入了BLEU或F分数流利或意义的方面 MT评估的充分性。在一个两种统计的实验比较显着性检验表明，p值通过近似保守估计随机比引导测试，从而增加了可能性后者的I型错误。我们指出摆脱了随机评估重要性的陷阱在多个成对比较中并提出以下建议：将NIST与近似随机化相结合，在更严格的拒绝水平比目前的标准。

著录项

来源
《43rd Annual Meeting of the Association for Computational Linguistics: Proceeding of the Conference》|2005年|57-64|共8页
会议地点
作者
Stefan Riezler; John T. Maxwell III;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Significance tests of automatic machine translation evaluation metrics [J] . Ying Zhang, Stephan Vogel Machine translation . 2010,第1期

机译：自动机器翻译评估指标的意义测试
2. BRCA1 and BRCA2 genetic testing-pitfalls and recommendations for managing variants of uncertain clinical significance [J] . Eccles D. M., Mitchell G., Monteiro A. N. A., Annals of oncology: official journal of the European Society for Medical Oncology . 2015,第10期

机译：BRCA1和BRCA2基因测试的陷阱以及管理不确定临床意义的变异的建议
3. Significance tests in clinical research-Challenges and pitfalls [J] . Eva Skovlund Scandinavian journal of pain . 2013,第4期

机译：临床研究中的意义测试-挑战和陷阱
4. The Significance of Recall in Automatic Metrics for MT Evaluation [C] . Alon Lavie, Kenji Sagae, Shyamsundar Jayaraman Conference of the Association for Machine Translation in the Americas(AMTA 2004); 20040928-1002; Washington,DC(US) . 2004

机译：召回在自动度量中对MT评估的意义
5. Advancing Millimeter-Wave Vehicular Radar Test Targets for Automatic Emergency Braking (AEB) Sensor Evaluation [D] . Belgiovane, Domenic John, Jr. 2017

机译：推进用于自动紧急制动（AEB）传感器评估的毫米波车载雷达测试目标
6. Evaluation of Cell Cycle Arrest in Estrogen Responsive MCF-7 Breast Cancer Cells: Pitfalls of the MTS Assay [O] . Eileen M. McGowan, Nikki Alling, Elise A. Jackson, 2008

机译：雌激素反应性MCF-7乳腺癌细胞中细胞周期阻滞的评估：MTS分析的陷阱。
7. The Significance of Recall in Automatic Metrics for MT Evaluation [O] . Alon Lavie, Kenji Sagae, Shyamsundar Jayaraman 2004

机译：监测在mT评价自动度量中的意义
8. Testing of the Prototype Automatic Marine Telephone System (AMTS) [R] . Spence, R. E. 1986

机译：原型自动船用电话系统（amTs）的测试

On Some Pitfalls in Automatic Evaluation and Significance Testing for MT

摘要

著录项

相似文献

相关主题

期刊订阅