首页> 外文会议>Workhshop on NLP for Conversational AI >How to Tame Your Data: Data Augmentation for Dialog State Tracking
【24h】

How to Tame Your Data: Data Augmentation for Dialog State Tracking

机译:如何驯服您的数据:对话框状态跟踪的数据增强

获取原文
获取外文期刊封面目录资料

摘要

Dialog State Tracking (DST) is a problem space in which the effective vocabulary is practically limitless. For example, the domain of possible movie titles or restaurant names is bound only by the limits of language. As such, DST systems often encounter out-of-vocabulary words at inference time that were never encountered during training. To combat this issue, we present a targeted data augmentation process, by which a practitioner observes the types of errors made on held-out evaluation data, and then modifies the training data with additional corpora to increase the vocabulary size at training time. Using this with a RoBERTa-based Transformer architecture, we achieve state-of-the-art results in comparison to systems that only mask trouble slots with special tokens. Additionally, we present a data-representation scheme for seamlessly retargeting DST architectures to new domains.
机译:对话框状态跟踪(DST)是一个问题空间,其中有效词汇实际上是无限的。例如,可能的电影标题或餐馆名称的域只均受语言的限制。因此,DST系统经常在培训期间从未遇到的推理时间遇到过词的单词。为了打击这个问题,我们提出了一个有针对性的数据增强过程,从业者观察到了一项关于举出的评估数据的错误类型,然后将培训数据与额外的Corpora修改,以增加培训时间的词汇量。使用此功能与基于Roberta的变压器架构,我们实现了最先进的导致系统相比,只有仅使用特殊令牌的错误插槽。此外,我们提供了一种数据表示方案,用于将DST架构无缝回溯到新域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号