Speaker independent diarization for child language environment analysis using deep neural networks

机译：使用深度神经网络进行儿童语言环境分析的说话人独立分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale monitoring of the child language environment through measuring the amount of speech directed to the child by other children and adults during a vocal communication is an important task. Using the audio extracted from a recording unit worn by a child within a childcare center, at each point in time our proposed diarization system can determine the content of the child's language environment, by categorizing the audio content into one of the four major categories, namely (1) speech initiated by the child wearing the recording unit, speech originated by other (2) children or (3) adults and directed at the primary child, and (4) non-speech contents. In this study, we exploit complex Hidden Markov Models (HMMs) with multiple states to model the temporal dependencies between different sources of acoustic variability and estimate the HMM state output probabilities using deep neural networks as a discriminative modeling approach. The proposed system is robust against common diarization errors caused by rapid turn takings, between class similarities, and background noise without the need to prior clustering techniques. The experimental results confirm that this approach outperforms the state-of-the-art Gaussian mixture model based diarization without the need for bottom-up clustering and leads to 22.24% relative error reduction.

机译：通过测量在语音交流过程中其他儿童和成人针对儿童的语音量，对儿童语言环境进行大规模监控是一项重要的任务。通过使用从儿童保育中心内儿童佩戴的记录单元中提取的音频，我们建议的隔离系统可以在每个时间点将音频内容分类为四个主要类别之一，从而确定儿童语言环境的内容：（1）由佩戴录音装置的孩子发出的语音，由其他（2）儿童或（3）成人发出的，针对小孩子的语音，以及（4）非语音内容。在这项研究中，我们利用具有多个状态的复杂的隐马尔可夫模型（HMM）来对不同声变异源之间的时间依赖性进行建模，并使用深度神经网络作为判别建模方法来估计HMM状态输出的概率。所提出的系统对于由快速转弯，类别相似度之间的背景误差和背景噪声引起的常见的误差具有鲁棒性，而无需现有的聚类技术。实验结果证实，该方法优于基于最新高斯混合模型的二值化方法，而无需进行自下而上的聚类，从而可减少22.24％的相对误差。

著录项

来源
《IEEE Workshop on Spoken Language Technology》|2016年|114-120|共7页
会议地点
作者
Maryam Najafian; John H. L. Hansen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Speech; Acoustics; Decoding; Viterbi algorithm; Speech recognition; Training;

机译：隐马尔可夫模型语音声学解码Viterbi算法语音识别训练;

相似文献

外文文献
中文文献
专利

1. Deep neural networks based binary classification for single channel speaker independent multi-talker speech separation [J] . Saleem Nasir, Khattak Muhammad Irfan Applied Acoustics . 2020,第Octa期

机译：基于深度神经网络的单通道扬声器独立多讲车语音分离二进制分类
2. Text-Independent Speaker Identification Through Feature Fusion and Deep Neural Network [J] . Jahangir Rashid, TEh Ying Wah, Memon Nisar Ahmed, Quality Control, Transactions . 2020,第期

机译：通过特征融合和深神经网络的文本独立扬声器识别
3. A Method for Designing Neural Networks Using Nonlinear Multivariate Analysis: Application to Speaker-Independent Vowel Recognition [J] . Irino T, Kawahara H Neural computation . 1990,第3期

机译：基于非线性多元分析的神经网络设计方法：在独立于说话人的元音识别中的应用
4. Speaker independent diarization for child language environment analysis using deep neural networks [C] . Maryam Najafian, John H. L. Hansen IEEE Workshop on Spoken Language Technology . 2016

机译：使用深神经网络的儿童语言环境分析扬声器独立日益升级
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. Deep reasoning neural network analysis to predict language deficits from psychometry‐driven DWI connectome of young children with persistent language concerns [O] . Jeong‐Won Jeong, Soumyanil Banerjee, Min‐Hee Lee, 2021

机译：深度推理的神经网络分析预测持久性语言关注的幼儿心理学驾驶DWI连接语言缺陷
7. Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings [O] . Cyrta, Pawel, Trzciński, Tomasz, Stokowiec, Wojciech 2017

机译：使用深度递归卷积神经网络的扬声器二值化用于扬声器嵌入

Speaker independent diarization for child language environment analysis using deep neural networks

摘要

著录项

相似文献

相关主题

期刊订阅