首页> 外文会议>Workshop on Human Evaluation of NLP Systems >Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

【24h】

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

机译：估算主观人群评估作为提高自然语言生成的额外目标

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Human ratings are one of the most prevalent methods to evaluate the performance of natural language processing algorithms. Similarly, it is common to measure the quality of sentences generated by a natural language generation model using human raters. In this paper, we argue for exploring the use of subjective evaluations within the process of training language generation models in a multi-task learning setting. As a case study, we use a crowd-authored dialogue corpus to fine-tune six different language generation models. Two of these models incorporate multi-task learning and use subjective ratings of lines as part of an explicit learning goal. A human evaluation of the generated dialogue lines reveals that utterances generated by the multi-tasking models were subjectively rated as the most typical, most moving the conversation forward, and least offensive. Based on these promising first results, we discuss future research directions for incorporating subjective human evaluations into language model training and to hence keep the human user in the loop during the development process.

机译：人类评级是评估自然语言处理算法性能的最普遍的方法之一。类似地，通常使用人类评估者测量由自然语言生成模型产生的句子的质量。在本文中，我们争辩探讨在多任务学习设置中培训语言生成模型过程中使用主观评估的使用。作为一个案例研究，我们使用人群撰写的对话语料库来微调六种不同的语言生成模型。这些模型中的两个包含多任务学习，并使用线的主观评级作为明确学习目标的一部分。对生成的对话线的人类评估表明，由多任务模型产生的话语是最受主观的评级，作为最典型的，最大的对话前进，最不令人反感。基于这些有前途的第一个结果，我们讨论了将主观人类评估纳入语言模型培训的未来研究方向，从而在开发过程中将人类用户保持在循环中。

著录项

来源
《Workshop on Human Evaluation of NLP Systems》|2021年|13-24|共12页
会议地点
作者
Jakob Nyberg; Ramesh Manuvinakurike; Maike Paetzel-Pruesmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Estimates of nasal airflow at the nasal cycle mid-point improve the correlation between objective and subjective measures of nasal patency [J] . Gaberino Courtney, Rhee John S., Garcia Guilherme J. M. Respiratory physiology & neurobiology . 2017,第期

机译：鼻腔周期中鼻气流的估计中点提高了鼻通畅的目标和主观测量之间的相关性
2. Search Challenges in Natural Language Generation with Complex Optimization Objectives [J] . Jorg Hoffmann, Vera Demberg, David M. Howcroft, Kunstliche Intelligenz . 2016,第1期

机译：具有复杂优化目标的自然语言生成中的搜索挑战
3. Subjective and Objective Analysis of Speech Enhancement Algorithms for Single Channel Speech Patterns of Indian and English Languages [J] . Sachin Singh, Manoj Tripathy, R. S. Anand IETE Technical Review . 2014,第1期

机译：印度和英语单通道语音模式语音增强算法的主观和客观分析
4. Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting [C] . Sebastian Borgeaud, Guy Emerson Workshop on neural generation and translation . 2020

机译：利用自然语言生成的句子相似性：使用范围投票改善光束搜索
5. Multi-Objective Learning for Multi-Modal Natural Language Generation [D] . Pasunuru, Ramakanth. 2021

机译：多目标自然语言生成的多目标学习
6. Estimates of nasal airflow at the nasal cycle mid-point improve the correlation between objective and subjective measures of nasal patency [O] . Courtney Gaberino, John S. Rhee, Guilherme J.M. Garcia -1

机译：在鼻腔周期中点估计鼻气流量可改善鼻腔通畅的客观和主观测量之间的相关性
7. Estimates of nasal airflow at the nasal cycle mid-point improve the correlation between objective and subjective measures of nasal patency [O] . Courtney Gaberino, John S. Rhee, Guilherme J.M. Garcia 2017

机译：鼻腔周期中点鼻气流的估计提高了鼻通畅的目标和主观测量之间的相关性
8. Development and Validation of a Second Generation Visibility-Based Model forPredicting Subjective and Objective Minimum Resolvable Temperature Difference Performance for Staring Thermal Imaging Systems [R] . Groen, M. S. 1995

机译：开发和验证第二代基于可见性的模型，用于预测凝视热成像系统的主观和客观最小可分辨温差性能

Estimating Subjective Crowd-Evaluations as an Additional Objective to Improve Natural Language Generation

摘要

著录项

相似文献

相关主题

期刊订阅