首页> 外文期刊>Computer speech and language >Fast vocabulary acquisition in an NMF-based self-learning vocal user interface
【24h】

Fast vocabulary acquisition in an NMF-based self-learning vocal user interface

机译:基于NMF的自学语音用户界面中的快速词汇习得

获取原文
获取原文并翻译 | 示例
           

摘要

In command-and-control applications, a vocal user interface (VUI) is useful for handsfree control of various devices, especially for people with a physical disability. The spoken utterances are usually restricted to a predefined list of phrases or to a restricted grammar, and the acoustic models work well for normal speech. While some state-of-the-art methods allow for user adaptation of the predefined acoustic models and lexicons, we pursue a fully adaptive VUI by learning both vocabulary and acoustics directly from interaction examples. A learning curve usually has a steep rise in the beginning and an asymptotic ceiling at the end. To limit tutoring time and to guarantee good performance in the long run, the word learning rate of the VUI should be fast and the learning curve should level off at a high accuracy. In order to deal with these performance indicators, we propose a multi-level VUI architecture and we investigate the effectiveness of alternative processing schemes. In the low-level layer, we explore the use of MIDA features (Mutual Information Discrimination Analysis) against conventional MFCC features. In the mid-level layer, we enhance the acoustic representation by means of phone posteriorgrams and clustering procedures. In the high-level layer, we use the NMF (Non-negative Matrix Factorization) procedure which has been demonstrated to be an effective approach for word learning. We evaluate and discuss the performance and the feasibility of our approach in a realistic experimental setting of the VUI-user learning context.
机译:在命令和控制应用程序中,语音用户界面(VUI)对于各种设备的免提控制非常有用,尤其是对于肢体残疾的人。语音通常仅限于预定义的短语列表或受限的语法,并且声学模型对于正常语音非常有效。虽然一些最新方法允许用户适应预定义的声学模型和词典,但我们通过直接从交互示例中学习词汇和声学来追求完全自适应的VUI。学习曲线通常在开始时陡峭上升,在结束时渐近上限。为了限制补习时间并确保长期良好的性能,VUI的单词学习速度应该很快,并且学习曲线应保持高精度。为了处理这些性能指标,我们提出了一个多级VUI体系结构,并研究了替代处理方案的有效性。在低层,我们探索了MIDA功能(相互信息区分分析)与常规MFCC功能的结合使用。在中层,我们通过电话后序图和聚类程序增强声学表示。在高层,我们使用NMF(非负矩阵分解)过程,该过程已被证明是一种有效的单词学习方法。我们在VUI用户学习环境的实际实验环境中评估和讨论了我们的方法的性能和可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号