首页> 外文会议>International Conference on Multimedia Modeling >Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach
【24h】

Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach

机译:推断真实语音数据的重点:注意多模态神经网络方法

获取原文

摘要

To understand speakers' attitudes and intentions in real Voice Dialogue Applications (VDAs), effective emphasis inference from users' queries may play an important role. However, in VDAs, there are tremendous amount of uncertain speakers with a great diversity of users' dialects, expression preferences, which challenge the traditional emphasis detection methods. In this paper, to better infer emphasis for real voice data, we propose an attentive multimodal neural network. Specifically, first, beside the acoustic features, extensive textual features are applied in modelling. Then, considering the feature in-dependency, we model the multi-modal features utilizing a Multi-path convolutional neural network (MCNN). Furthermore, combining high-level multi-modal features, we train an emphasis classifier by attending on the textual features with an attention-based bidirectional long short-term memory network (ABLSTM), to comprehensively learn discriminative features from diverse users. Our experimental study based on a real-world dataset collected from Sogou Voice Assistant (https://yy.sogou.com/) show that our method outperforms (over 1.0-15.5% in terms of Fl measure) alternative baselines.
机译:为了了解真实语音对话应用程序(VDA)中说话者的态度和意图,从用户查询中得出的有效重点推断可能起重要作用。但是,在VDA中,存在大量不确定的说话者,其用户的方言,表达偏好各不相同,这给传统的重点检测方法带来了挑战。在本文中,为了更好地推断真实语音数据的重点,我们提出了一个细心的多峰神经网络。具体而言,首先,除了声学特征之外,在建模中还应用了广泛的文本特征。然后,考虑到特征的独立性,我们利用多路径卷积神经网络(MCNN)对多模式特征进行建模。此外,通过结合高级多模式功能,我们通过使用基于注意力的双向长期短期记忆网络(ABLSTM)参加文本功能来训练重点分类器,以全面学习来自不同用户的歧视性功能。我们基于从Sogou Voice Assistant(https://yy.sogou.com/)收集的真实数据集进行的实验研究表明,我们的方法优于替代基线(就Fl度量而言,超过1.0-15.5%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号