首页> 外文会议>Conference of the International Speech Communication Association >IsNL? A Discriminative Approach to Detect Natural Language Like Queries for Conversational Understanding Asli Celikyilmaz, Gokhan Tur, Dilek Hakkani-Tür Microsoft Silicon Valley, USA
【24h】

IsNL? A Discriminative Approach to Detect Natural Language Like Queries for Conversational Understanding Asli Celikyilmaz, Gokhan Tur, Dilek Hakkani-Tür Microsoft Silicon Valley, USA

机译:isnl?一种辨别方法来检测对话理解的自然语言,如对会话理解asli celikyilmaz,悟空in,Dilek hakkani-tür微软硅谷,美国

获取原文

摘要

While data-driven methods for spoken language understanding (SLU) provide state of the art performances and reduce maintenance and model adaptation costs compared to handcrafted parsers, the collection and annotation of domain-specific natural language utterances for training remains a time-consuming task. A recent line of research has focused on enriching the training data with in-domain utterances by mining search engine query logs to improve the SLU tasks. However genre mismatch is a big obstacle as search queries are typically keywords. In this paper, we present an efficient discriminative binary classification method that filters large collection of online web search queries only to select the natural language like queries. The training data used to build this classifier is mined from search query click logs, represented as a bipartite graph. Starting from queries which contain natural language salient phrases, random graph walk algorithms are employed to mine corresponding keyword queries. Then an active learning method is employed for quickly improving on top of this automatically mined data. The results show that our method is robust to noise in search queries by improving over a baseline model previously used for SLU data collection. We also show the effectiveness of detected natural language like queries in extrinsic evaluations on domain detection and slot filling tasks.
机译:虽然数据驱动的语言理解方法(SLU)提供了与手工解析器相比的艺术表演的状态,并降低了维护和模型适应成本,用于培训的域特定的自然语言话语的收集和注释仍然是耗时的任务。最近的一系列研究专注于通过挖掘搜索引擎查询日志来丰富域中的培训数据来改善SLU任务。然而,类型不匹配是一个大障碍,因为搜索查询通常是关键词。在本文中,我们提出了一种有效的判别二进制分类方法,其仅筛选了大量的在线网络搜索查询,以选择类似查询的自然语言。用于构建此分类器的培训数据从Search查询单击日志中挖掘,表示为二角形图形。从包含自然语言突出短语的查询开始,采用随机图行走算法来挖掘相应的关键字查询。然后,使用活动学习方法来快速改进此自动挖掘数据的顶部。结果表明,通过改进以前用于SLU数据收集的基线模型,我们的方法对搜索查询中的噪声稳健。我们还显示了在域检测和插槽填充任务的外在评估中检测到的自然语言的有效性。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号