Language Identification based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs

机译：基于从帧内DNN和LSTM-RNN中提取的后图序列的生成建模的语言识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper aims to enhance spoken language identification methods based on direct discriminative modeling of language labels using deep neural networks (DNNs) and long short-term memory recurrent neural networks (LSTM-RNNs). In conventional methods, frame-by-frame DNNs or LSTM-RNNs are used for utterance-level classification. Although they have strong frame-level classification performance and real-time efficiency, they are not optimized for variable length utterance-level classification since the classification is conducted by simply averaging frame-level prediction results. In addition, the simple classification methodology cannot fully utilize the combination of DNNs and LSTM-RNNs. To address these issues, our idea is to combine the frame-by-frame DNNs and LSTM-RNNs with a sequential generative model based classifier. In the proposed method, we regard posteriorgram sequences generated from a frame-by-frame classifier as feature sequences, and model them with respect to each language using language modeling technologies. The generative model based classifier does not model an identification boundary, so we can flexibly deal with variable length utterances without loss of conventional advantages. Furthermore, the proposed method can support the combination of DNNs and LSTMs using joint posteriorgram sequences, those of generative modeling can capture differences between two posteriorgram sequences. Experiments conducted using the GlobalPhone database demonstrate the proposed method's effectiveness.

机译：本文旨在利用深神经网络（DNN）和长短期内存经常性神经网络（LSTM-RNNS）基于基于直接辨别语言标签的语言鉴定方法来提高口语识别方法。在传统方法中，逐帧DNN或LSTM-RNN用于话语级分类。虽然它们具有强大的帧级分类性能和实时效率，但由于简单地平均帧级预测结果进行了分类，因此它们没有针对可变长度的话语级分类进行优化。此外，简单的分类方法无法充分利用DNN和LSTM-RNN的组合。为了解决这些问题，我们的想法是将帧框架DNN和LSTM-RNN与基于顺序生成模型的分类器组合。在所提出的方法中，我们将从帧帧分类器生成的后视序列视为特征序列，并使用语言建模技术对它们进行模拟。基于生成模型的分类器不会绘制识别边界，因此我们可以灵活地处理可变长度的发声，而不会损失传统的优势。此外，所提出的方法可以使用联合后速序列支持DNN和LSTM的组合，生成建模的那些可以捕获两个后验序列之间的差异。使用Globalphone数据库进行的实验表明了提出的方法的有效性。

著录项

来源
《Annual Conference of the International Speech Communication Association》|2016年|p3106-3887|共5页
会议地点
作者
Ryo Masumura; Taichi Asami; Hirokazu Masataki; Yushi Aono; Sumitaka Sakauchi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB95-53;
关键词

相似文献

外文文献
中文文献
专利

1. Curriculum learning based approach for noise robust language identification using DNN with attention [J] . Vuddagiri Ravi Kumar, Vydana Hari Krishna, Vuppala Anil Kumar Expert Systems with Application . 2018,第NOVa期

机译：基于课程学习的DNN噪声鲁棒语言识别方法
2. Frame-by-frame language identification in short utterances using deep neural networks [J] . Gonzalez-Dominguez Javier, Lopez-Moreno Ignacio, Moreno Pedro J., Neural Networks: The Official Journal of the International Neural Network Society . 2015,第Null期

机译：使用深度神经网络在短话语中逐帧识别语言
3. Deterministic process-based generative models for characterizing packet-level bursty error sequences [J] . He Yejun, Salih Omar S., Wang Cheng-Xiang, Wireless communications & mobile computing . 2015,第3期

机译：基于确定性过程的生成模型，用于表征数据包级突发错误序列
4. Language Identification based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs [C] . Ryo Masumura, Taichi Asami, Hirokazu Masataki, Annual Conference of the International Speech Communication Association . 2016

机译：基于从帧内DNN和LSTM-RNN中提取的后图序列的生成建模的语言识别
5. Automatic language identification with sequences of language-independent phoneme clusters. [D] . Berkling, Kay Margarethe. 1996

机译：使用与语言无关的音素簇的序列进行自动语言识别。
6. An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition [O] . Alicia Lozano-Diez, Ruben Zazo, Doroteo T. Toledano, -1

机译：深度神经网络（DNN）拓扑对基于瓶颈特征的语言识别的影响分析
7. Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-tracking [O] . Hiroki Tamaru, Yuki Saito, Shinnosuke Takamichi, 2019

机译：基于网络的基于网络的随机调制基于网络的随机调制，用于DNN的歌声语音合成和神经双跟踪的基于网络的随机调制
8. Asynchronous Nature of Communication in Concurrent Logic Languages: A Fully Abstract Model Based on Sequences [R] . de Boer, F. S., Palamidessi, C. 1990

机译：并发逻辑语言中的通信异步性：基于序列的全抽象模型

Language Identification based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs

摘要

著录项

相似文献

相关主题

期刊订阅