首页> 外文期刊>Natural language engineering >Automatic annotation of context and speech acts for dialogue corpora
【24h】

Automatic annotation of context and speech acts for dialogue corpora

机译:对话语料库的上下文和语音行为自动注释

获取原文
获取原文并翻译 | 示例
           

摘要

Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds 'Information State Update' (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human-machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available.
机译:带有批注的对话语料库对于对话管理,上下文相关解释和上下文相关语音识别的统计学习方法中的新研究方向至关重要。特别是,迫切需要带有上下文信息和言语行为的大型对话语料库。我们探索如何自动处理现有的对话语料库(通常由话语转录组成)以产生新的语料库,在该语料库中可以准确表示对话上下文和言语行为。我们提出了用于生成此类语料库的概念和计算框架。例如,我们介绍并评估一个自动注释系统,该系统为Communicator(2000和2001)人机对话(2,331个对话)语料库建立对话上下文的“信息状态更新”(ISU)表示。此批注的目的是生成语料库,以加强对话策略的学习,构建用户模拟,针对基准评估不同的对话策略以及训练用于上下文相关的解释和语音识别的模型。自动注释系统将系统和用户话语解析为语音行为,并使用ISU对话管理器建立对话上下文表示的序列。我们介绍了自动注释系统的体系结构,并提供了一个详细的示例来说明系统组件如何进行交互以生成注释。我们还将评估注释,相对于原始语料库的任务完成指标,以及与手动注释的数据和基线自动系统生成的注释进行比较。自动注释的性能良好,并且在所有方面均大大优于基线自动注释。生成的带注释的语料库已被用于训练高质量的用户模拟和学习成功的对话策略。最终语料将公开发布。

著录项

  • 来源
    《Natural language engineering》 |2009年第3期|315-353|共39页
  • 作者单位

    Institute for Creative Technologies, University of Southern California, 13274 Fiji Way, Marina del Rey, CA 90292, USA;

    School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK;

    Department of Computer Science, University of Geneva, Battelle batiment A, 7 route de Drize, 1227 Carouge, Switzerland;

    School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号