首页> 外文会议>International Joint Conference on Artificial Intelligence >Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings
【24h】

Submodularity-Inspired Data Selection for Goal-Oriented Chatbot Training Based on Sentence Embeddings

机译:基于句子嵌入的面向目标的Chatbot培训的子骨折 - 灵感数据选择

获取原文

摘要

Spoken language understanding (SLU) systems. such as goal-oriented chatbots or personal assistants, rely on an initial natural language understanding (NLU) module to determine the intent and to extract the relevant information from the user queries they take as input. SLU systems usually help users to solve problems in relatively narrow domains and require a large amount of in-domain training data. This leads to significant data availability issues that inhibit the development of successful systems. To alleviate this problem, we propose a technique of data selection in the low-data regime that enables us to train with fewer labeled sentences, thus smaller labelling costs. We propose a submodularity-inspired data ranking function, the ratio-penalty marginal gain, for selecting data points to label based only on the information extracted from the textual embedding space. We show that the distances in the embedding space are a viable source of information that can be used for data selection. Our method outperforms two known active learning techniques and enables cost-efficient training of the NLU unit. Moreover, our proposed selection technique does not need the model to be retrained in between the selection steps, making it time efficient as well.
机译:口语语言理解(SLU)系统。如面向目标的聊天或个人助理,依赖于初始自然语言理解(NLU)模块来确定意图并从他们作为输入中的用户查询中提取相关信息。 SLU系统通常帮助用户解决相对狭窄的域中的问题,并且需要大量的域培训数据。这导致了有关抑制成功系统的发展的重要数据可用性问题。为了减轻这个问题,我们提出了一种在低数据制度中选择的数据选择技术,使我们能够用更少的标记句子训练,从而越小的标记成本。我们提出了一个Subsoculary灵感的数据排名函数,与从文本嵌入空间中提取的信息的信息选择数据指向标签的比率罚款。我们表明嵌入空间中的距离是可用于数据选择的可行信息源。我们的方法优于两种已知的主动学习技术,并实现了对NLU单元的成本有效的训练。此外,我们所提出的选择技术不需要在选择步骤之间进行再培训模型,也使其效率高效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号