A Unified Endpointer Using Multitask and Multidomain Training

机译：使用多任务和多域培训的统一端点

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In speech recognition systems, we generally differentiate the role of endpointers between long-form speech and voice queries, where they are responsible for speech detection and query endpoint detection respectively. Detection of speech is useful for segmentation and pre-filtering in long-form speech processing. On the other hand, query endpoint detection predicts when to stop listening and send audio received so far for actions. It thus determines system latency and is an essential component for interactive voice systems. For both tasks, endpointer needs to be robust in challenging environments, including noisy conditions, reverberant environments and environments with background speech, and it has to generalize well to different domains with different speaking styles and rhythms. This work investigates building a unified endpointer by folding the separate speech detection and query endpoint detection tasks into a single neural network model through multitask learning. A categorical domain representation is further incorporated into the model to encourage learning domain specific information. The final unified model achieves around 100 ms (18% relatively) latency improvement for near-field voice queries and 150 ms (21% relatively) for far-field voice queries over simply pooling all the data together and 7% relative frame error rate reduction for long-form speech compared to a standalone speech detection model. The proposed approach also shows good robustness to noisy environments and yields 180 ms latency improvement on voice queries from an unseen domain.

机译：在语音识别系统中，我们通常将终结者的角色区别于长形式的语音和语音查询，它们分别负责语音检测和查询终结点检测。语音检测对于长格式语音处理中的分段和预过滤很有用。另一方面，查询端点检测可预测何时停止收听并发送到目前为止已收到的声音以进行操作。因此，它确定系统等待时间，并且是交互式语音系统的基本组件。对于这两项任务，终结者都必须在具有挑战性的环境（包括嘈杂的环境，混响的环境以及具有背景语音的环境）中保持强大，并且必须将其很好地推广到具有不同讲话风格和节奏的不同领域。这项工作研究了如何通过多任务学习将单独的语音检测和查询端点检测任务折叠到单个神经网络模型中来构建统一的端点终结器。将分类领域表示形式进一步合并到模型中，以鼓励学习特定于领域的信息。最终的统一模型通过简单地将所有数据集中在一起，实现了近场语音查询的延迟大约100毫秒（相对18％）的改善，远场语音查询的延迟150毫秒（相对21％）的改善，相对帧错误率降低了7％与独立语音检测模型相比，适用于长格式语音。所提出的方法还显示出对嘈杂环境的良好鲁棒性，并在来自看不见域的语音查询上产生了180 ms的延迟改进。

著录项

来源
《IEEE Automatic Speech Recognition and Understanding Workshop》|2019年|100-106|共7页
会议地点
作者
Shuo-Yiin Chang; Bo Li; Gabor Simko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Speech recognition; Training; Indexes; Google; Voice activity detection; Computational modeling;

机译：任务分析;语音识别;训练;索引;谷歌;语音活动检测;计算模型;

相似文献

外文文献
中文文献
专利

1. The Quest for a Unified Theory of Multitasking. The Multitasking Mind, Salvucci, Taatgen. Oxford University Press (2011) [J] . Christian P. Janssen Cognitive Systems Research . 2012,第1期

机译：寻求统一的多任务理论。多任务思维，Salvucci，Taatgen。牛津大学出版社（2011）
2. Image Recognition by Predicted User Click Feature With Multidomain Multitask Transfer Deep Network [J] . Min Tan, Jun Yu, Hongyuan Zhang, IEEE Transactions on Image Processing . 2019,第12期

机译：多域多任务传输深度网络的预测用户点击功能识别图像
3. Virtual optical network provisioning with unified service logic processing model for software-defined multidomain optical networks [J] . Yongli Zhao, Shikun Li, Yinan Song, Optical engineering . 2015,第12期

机译：具有软件定义的多域光网络的统一服务逻辑处理模型的虚拟光网络供应
4. A Unified Endpointer Using Multitask and Multidomain Training [C] . Shuo-Yiin Chang, Bo Li, Gabor Simko IEEE Automatic Speech Recognition and Understanding Workshop . 2019

机译：使用多任务和多域培训的统一终点
5. The effects of multitasking training in Star Craft II. [D] . Ross, Aaron E. 2013

机译：《星际争霸2》中多任务训练的效果。
6. A Unified Multitask Architecture for Predicting Local Protein Properties [O] . Yanjun Qi, Merja Oja, Jason Weston, 2009

机译：预测局部蛋白质特性的统一多任务架构
7. Could a Multitask Balance Training Program Complement the Balance Training in Healthy Preschool Children: A Quasi-Experimental Study [O] . Vanesa Abuín-Porras, Carmen Jiménez Antona, María Blanco-Morales, 2020

机译：多任务平衡培训计划可以补充健康学龄前儿童的平衡培训：准实验研究

A Unified Endpointer Using Multitask and Multidomain Training

摘要

著录项

相似文献

相关主题

期刊订阅