首页> 外文期刊>Computer speech and language >Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction
【24h】

Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction

机译:语音对话系统的语音理解:从语料库收集到语法规则归纳

获取原文
获取原文并翻译 | 示例
       

摘要

We investigate algorithms and tools for the semi-automatic authoring of grammars for spoken dialogue systems (SDS) proposing a framework that spans from corpora creation to grammar induction algorithms. A realistic human-in-the-loop approach is followed balancing automation and human intervention to optimize cost to performance ratio for grammar development. Web harvesting is the main approach investigated for eliciting spoken dialogue textual data, while crowdsourcing is also proposed as an alternative method. Several techniques are presented for constructing web queries and filtering the acquired corpora. We also investigate how the harvested corpora can be used for the automatic and semi-automatic (human-in-the-loop) induction of grammar rules. SDS grammar rules and induction algorithms are grouped into two types, namely, low- and high-level. Two families of algorithms are investigated for rule induction: one based on semantic similarity and distributional semantic models, and the other using more traditional statistical modeling approaches (e.g., slot-filling algorithms using Conditional Random Fields). Evaluation results are presented for two domains and languages. High-level induction precision scores up to 60% are obtained. Results advocate the portability of the proposed features and algorithms across languages and domains.
机译:我们研究了语音对话系统(SDS)的半自动语法创作算法和工具,提出了从语料库创建到语法归纳算法的框架。遵循一种现实的“在环”方法,在自动化和人为干预之间取得平衡,以优化语法开发的性价比。 Web收集是用于获取口语对话文本数据的主要方法,同时也建议使用众包作为替代方法。提出了几种技术来构造Web查询和过滤获取的语料库。我们还研究了如何将收集的语料库用于语法规则的自动和半自动(循环中的人工操作)归纳。 SDS语法规则和归纳算法分为两类,即低级和高级。研究了两种用于规则归纳的算法:一种基于语义相似性和分布式语义模型,另一种使用更传统的统计建模方法(例如,使用条件随机场的时隙填充算法)。给出了针对两个领域和语言的评估结果。获得了高达60%的高水平感应精度得分。结果证明了所建议的功能和算法在语言和领域之间的可移植性。

著录项

  • 来源
    《Computer speech and language》 |2018年第1期|272-297|共26页
  • 作者单位

    School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece,'Athena ' - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Athens, Greece;

    School of Electronic and Computer Engineering, Technical University of Crete, 73100 Chania, Greece;

    School of Electronic and Computer Engineering, Technical University of Crete, 73100 Chania, Greece;

    School of Electronic and Computer Engineering, Technical University of Crete, 73100 Chania, Greece;

    School of Electronic and Computer Engineering, Technical University of Crete, 73100 Chania, Greece;

    Voice Web S.A., 15124 Athens, Greece;

    School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece,'Athena ' - Research and Innovation Center in Information, Communication and Knowledge Technologies, 15125 Athens, Greece;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Spoken dialogue systems; Grammar induction; Corpora creation; Semantic similarity; Web mining; Crowdsourcing;

    机译:口语对话系统;语法归纳;语料库的创建;语义相似度;网络挖掘;众包;
  • 入库时间 2022-08-18 02:11:02

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号