首页> 外文期刊>Computers and the Humanities >Automatic induction of language model data for a spoken dialogue system
【24h】

Automatic induction of language model data for a spoken dialogue system

机译:自动归纳语言模型数据用于口语对话系统

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.
机译:在本文中,我们解决了在几乎没有实际用户数据可用的情况下生成域内语言模型训练数据的问题。采取的两个阶段的方法从数据归纳阶段开始,在此阶段中,将域外语句的语言结构收集起来并与人工构建的域内短语集成在一起。经过一些句法和语义过滤后,会产生大量的合成用户语音。在第二阶段,探索了两种采样方法来过滤合成语料库,以在句子级别和课堂级别上实现语义内容的所需概率分布。第一种方法利用用户模拟技术,该技术通过概率用户模型与对话系统之间的相互作用来获得概率模型。第二种方法是通过在开发人员在系统优化过程中产生的一小部分对话进行建模,从原始数据中合成新颖的对话交互。对餐厅信息领域中的识别性能进行评估。我们表明,可以通过用户模拟来实现对与用法相适应的语义内容分配的部分匹配。此外,当有限量的域内训练数据通过我们的方法得出的合成数据进行扩充时,可以降低单词错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号