首页> 外文OA文献 >Statistical methods in natural language understanding and spoken dialogue systems
【2h】

Statistical methods in natural language understanding and spoken dialogue systems

机译:自然语言理解和口语对话系统中的统计方法

摘要

Modern automatic spoken dialogue systems cover a wide range of applications. There are systems for hotel reservations, restaurant guides, systems for travel and timetable information, as well as systems for automatic telephone-banking services. Building the different components of a spoken dialogue system and combining them in an optimal way such that a reasonable dialogue becomes possible is a complex task because during the course of a dialogue, the system has to deal with uncertain information. In this thesis, we use statistical methods to model and combine the system's components. Statistical methods provide a well-founded theory for modeling systems where decisions have to be made under uncertainty. Starting from Bayes' decision rule, we define and evaluate various statistical models for these components, which comprise speech recognition, natural language understanding, and dialogue management. The problem of natural language understanding is described as a special machine translation problem where a source sentence is translated into a formal language target sentence consisting of concepts. For this, we define and evaluate two models. The first model is a generative model based on the source-channel paradigm. Because the word context plays an important role in natural language understanding tasks, we use a phrase-based translation system in order to take local context dependencies into account. The second model is a direct model based on the maximum entropy framework and works similar to a tagger. For the direct model, we define several feature functions that capture dependencies between words and concepts. Both methods have the advantage that only source-target pairs in the form of input-output sentences must be provided for training. Thus, there is no need to generate grammars manually, which significantly reduces the costs of building dialogue systems for new domains. Furthermore, we propose and investigate a framework based on minimum error rate training that results in a tighter coupling between speech recognition and language understanding. This framework allows for an easy integration of multiple knowledge sources by minimizing the overall error criterion. Thus, it is possible to add language understanding features to the speech recognition framework and thus to minimize the word error rate, or to add speech recognition features to the language understanding framework and thus to minimize the slot error rate. Finally, we develop a task-independent dialogue manager using trees as the fundamental data structure. Based on a cost function, the dialogue manager chooses the next dialogue action with minimal costs. The design and the task-independence of the dialogue manager leads to a strict separation of a given application and the operations performed by the dialogue manager, which simplifies porting an existing dialogue system to a new domain. We report results from a field test in which the dialogue manager was able to choose the optimal dialogue action in 90% of the decisions. We investigate techniques for error handling based on confidence measures defined for speech recognition and language understanding. Furthermore, we investigate the overall performance of the dialogue system when confidence measures from speech recognition and natural language understanding are incorporated into the dialogue strategy. Experiments have been carried out on the TelDir database, which is a German in-house telephone directory assistance corpus, and on the Taba database, which is a German in-house train time scheduling task.
机译:现代的自动语音对话系统涵盖了广泛的应用。有用于酒店预订的系统,餐厅指南,用于旅行和时间表信息的系统以及用于自动电话银行服务的系统。建立语音对话系统的不同组件并以最佳方式将它们组合在一起,以使进行合理的对话成为可能是一项复杂的任务,因为在对话过程中,该系统必须处理不确定的信息。在本文中,我们使用统计方法来建模和组合系统的组件。统计方法为必须在不确定性下做出决策的系统建模提供了有根据的理论。从贝叶斯的决策规则开始,我们为这些组件定义和评估各种统计模型,包括语音识别,自然语言理解和对话管理。自然语言理解问题被描述为一种特殊的机器翻译问题,其中源句子被翻译成由概念组成的正式语言目标句子。为此,我们定义和评估两个模型。第一个模型是基于源通道范式的生成模型。由于单词上下文在自然语言理解任务中起着重要作用,因此我们使用基于短语的翻译系统来考虑本地上下文依赖性。第二个模型是基于最大熵框架的直接模型,其工作原理类似于标记器。对于直接模型,我们定义了几个捕获单词和概念之间依赖性的功能。两种方法的优点在于,仅必须提供输入-输出语句形式的源-目标对来进行训练。因此,不需要手动生成语法,这大大降低了为新域构建对话系统的成本。此外,我们提出并研究了基于最小错误率训练的框架,该框架导致语音识别和语言理解之间的紧密结合。通过最小化总体错误标准,该框架允许轻松集成多个知识源。因此,可以将语言理解特征添加到语音识别框架,从而最小化单词错误率,或者将语音识别特征添加到语言理解框架,从而最小化时隙错误率。最后,我们使用树作为基本数据结构来开发独立于任务的对话管理器。根据成本函数,对话管理器以最小的成本选择下一个对话动作。对话管理器的设计和任务独立性导致给定应用程序和对话管理器执行的操作之间的严格分离,从而简化了将现有对话系统移植到新域的过程。我们报告的是现场测试的结果,其中对话管理者能够在90%的决策中选择最佳对话动作。我们研究基于为语音识别和语言理解定义的置信度度量的错误处理技术。此外,当将来自语音识别和自然语言理解的信心测量方法纳入对话策略时,我们将研究对话系统的整体性能。已经在TelDir数据库(这是德国内部电话目录服务语料库)和Taba数据库(这是德国内部火车时刻表任务)上进行了实验。

著录项

  • 作者

    Macherey Klaus;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 入库时间 2022-08-20 20:29:10

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号