首页> 外文学位 >An investigation of the relationship between automated Machine Translation Evaluation metrics and user performance on an information extraction task.

【24h】

An investigation of the relationship between automated Machine Translation Evaluation metrics and user performance on an information extraction task.

机译：对自动机器翻译评估指标与信息提取任务上的用户性能之间的关系的调查。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation applies nonparametric statistical techniques to Machine Translation (MT) Evaluation using data from a MT Evaluation experiment conducted through a joint Army Research Laboratory (ARL) and Center for the Advanced Study of Language (CASL) project. In particular, the relationship between human task performance on an information extraction task with translated documents and well-known automated translation evaluation metric scores for those documents is studied. Findings from a correlation analysis of the connection between autometrics and task-based metrics are presented and contrasted with current strategies for evaluating translations. A novel idea for assessing partial rank correlation within the presence of grouping factors is also introduced. Lastly, this dissertation presents a framework for task-based machine translation (MT) evaluation and predictive modeling of task responses that gives new information about the relative predictive strengths of the different autometrics (and re-coded variants of them) within the statistical Generalized Linear Models developed in analyses of the Information Extraction Task data.; This work shows that current autometrics are inadequate with respect to the prediction of task performance but, near adequacy can be accomplished through the use of re-coded autometrics in a logistic regression setting. As a result, a class of automated metrics that are best suitable for predicting performance is established and suggestions are offered about how to utilize metrics to supplement expensive and time-consuming experiments with human participants. Now users can begin to tie the intrinsic automated metrics to the extrinsic metrics for task they perform. The bottom line is that there is a need to average away MT dependence (averaged metrics perform better in overall predictions than original autometrics). Moreover, combinations of recoded metrics performed better than any individual metric. Ultimately, MT evaluation methodology is extended to create new metrics specially relevant to task-based comparisons. A formal method to establish that differences among metrics as predictors are strong enough not to be due by chance remains as future work.; Given the lack of connection in the field of MT Evaluation between task utility and the interpretation of automated evaluation metrics, as well as the absence of solid statistical reasoning in evaluating MT, there is a need to bring innovative and interdisciplinary analytical techniques to this problem. Because there are no papers in the MT evaluation literature that have done statistical modeling before or that have linked automated metrics with how well MT supports human tasks, this work is unique and has high potential for benefiting the Machine Translation research community.

机译：本论文运用非参数统计技术，通过陆军研究实验室（ARL）和高级语言研究中心（CASL）联合项目进行的MT评估实验中的数据，将其应用于机器翻译（MT）评估。特别地，研究在具有翻译文档的信息提取任务上的人类任务执行与那些文档的众所周知的自动翻译评估度量值之间的关系。呈现了自动度量和基于任务的度量之间的联系的相关性分析结果，并将其与当前评估翻译的策略进行了对比。还介绍了一种在存在分组因子的情况下评估部分等级相关性的新颖思路。最后，本文提出了一种基于任务的机器翻译（MT）评估和任务响应预测模型的框架，该框架提供了有关统计广义线性内不同自动度量（以及它们的重新编码变体）的相对预测强度的新信息。在分析信息提取任务数据中开发的模型。这项工作表明，当前的自动度量标准不足以预测任务绩效，但可以通过在逻辑回归设置中使用重新编码的自动度量标准来实现接近充分的性能。结果，建立了最适合预测性能的一类自动化指标，并提供了有关如何利用指标来补充人类参与者昂贵而费时的实验的建议。现在，用户可以开始将内在的自动化指标与他们执行的任务的外在指标联系起来。最重要的是，需要平均去除MT依赖性（平均指标在整体预测中比原始自动指标表现更好）。而且，重新编码的度量的组合比任何单个度量都表现更好。最终，MT评估方法得以扩展，以创建与基于任务的比较特别相关的新指标。确定未来指标的度量方法之间的差异足够强大而不会偶然发生的正式方法仍然存在。鉴于MT评估领域在任务效用和自动评估指标的解释之间缺乏联系，并且在评估MT方面缺乏可靠的统计推理，因此有必要为这一问题带来创新的跨学科分析技术。因为MT评估文献中没有论文之前进行过统计建模，也没有将自动化指标与MT支持人员任务的能力联系起来的文章，所以这项工作是独一无二的，并且很有可能使机器翻译研究界受益。

著录项

作者
Tate, Calandra Rilette.;
展开▼
作者单位

University of Maryland, College Park.$bApplied Mathematics and Scientific Computation.;

展开▼
授予单位 University of Maryland, College Park.$bApplied Mathematics and Scientific Computation.;
学科 Mathematics.; Statistics.; Computer Science.
学位 Ph.D.
年度 2007
页码 164 p.
总页数 164
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. How to evaluate machine translation: A review of automated and human metrics [J] . Natural language engineering . 2020,第PTa2期

机译：如何评估机器翻译：自动化和人工指标的回顾
2. Introduction to the special issue on 'Automated Metrics for Machine Translation Evaluation' [J] . Alon Lavie, Mark Przybocki Machine translation . 2009,第2a3期

机译：关于“机器翻译评估的自动度量”特刊的介绍
3. Utilizing Machine Learning and Automated Performance Metrics to Evaluate Robot-Assisted Radical Prostatectomy Performance and Predict Outcomes [J] . Hung Andrew J., Chen Jian, Che Zhengping, Journal of endourology . 2018,第5期

机译：利用机器学习和自动性能指标来评估机器人辅助的自由基前列腺切除术和预测结果
4. A fuzzier approach to machine translation evaluation: A pilot study on post-editing productivity and automated metrics in commercial settings [C] . Carla Parra Escartin, Manuel Arcedillo Fourth workshop on hybrid approaches to translation 2015 . 2015

机译：机器翻译评估的模糊方法：商业环境中的编辑后生产率和自动化指标的初步研究
5. Testing the influence of collective efficacy beliefs on group -level performance metrics: An investigation of the virtual team efficacy -performance relationship in information systems project management teams. [D] . Hardin, Andrew Martin. 2005

机译：测试集体效能信念对小组级绩效指标的影响：对信息系统项目管理团队中虚拟团队效能与绩效关系的调查。
6. Translating Glucose Variability Metrics into the Clinic via Continuous Glucose Monitoring: A Graphical User Interface for Diabetes Evaluation (CGM-GUIDE©) [O] . Renata A. Rawlings, Hang Shi, Lo-Hua Yuan, -1

机译：通过连续葡萄糖监测将葡萄糖变异性指标转化为临床：用于糖尿病评估的图形用户界面（CGM-GUIDE©）
7. PD58-01 UTILIZATION OF MACHINE LEARNING AND AUTOMATED PERFORMANCE METRICS TO EVALUATE ROBOT-ASSISTED RADICAL PROSTATECTOMY PERFORMANCE AND PREDICT PATIENT OUTCOMES [O] . Andrew Hung, Jian Chen, Zhengping Che, 2018

机译：PD58-01利用机器学习和自动绩效指标，评估机器人辅助自由基前列腺切除术能力，预测患者结果

An investigation of the relationship between automated Machine Translation Evaluation metrics and user performance on an information extraction task.

摘要

著录项

相似文献

相关主题

期刊订阅