首页> 外文期刊>Computer speech and language >Evaluating spoken dialogue systems according to de-facto standards: A case study
【24h】

Evaluating spoken dialogue systems according to de-facto standards: A case study

机译:根据实际标准评估口语对话系统:一个案例研究

获取原文
获取原文并翻译 | 示例

摘要

In the present paper, we investigate the validity and reliability of de-facto evaluation standards, defined for measuring or predicting the quality of the interaction with spoken dialogue systems. Two experiments have been carried out with a dialogue system for controlling domestic devices. During these experiments, subjective judgments of quality have been collected by two questionnaire methods (ITU-T Rec. P.851 and SASSI), and parameters describing the interaction have been logged and annotated. Both metrics served the derivation of prediction models according to the PARADISE approach. Although the limited database allows only tentative conclusions to be drawn, the results suggest that both questionnaire methods provide valid measurements of a large number of different quality aspects; most of the perceptive dimensions underlying the subjective judgments can also be measured with a high reliability. The extracted parameters mainly describe quality aspects which are directly linked to the system, environmental and task characteristics. Used as an input to prediction models, the parameters provide helpful information for system design and optimization, but not general predictions of system usability and acceptability.
机译:在本文中,我们调查了事实上的评估标准的有效性和可靠性,该标准旨在测量或预测与口语对话系统互动的质量。用对话系统进行了两个实验,用于控制家用设备。在这些实验中,通过两种问卷调查方法(ITU-T P.851建议书和SASSI)收集了质量的主观判断,并且已经记录并注释了描述相互作用的参数。根据PARADISE方法,这两个指标都可用于预测模型的推导。尽管数据库有限,只能得出初步结论,但结果表明,两种调查表方法都可以对大量不同质量方面进行有效测量;主观判断所依据的大多数感知维度也可以高度可靠地进行度量。提取的参数主要描述与系统,环境和任务特征直接相关的质量方面。这些参数用作预测模型的输入,可为系统设计和优化提供有用的信息,但不能提供系统可用性和可接受性的一般预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号