Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams

机译：Prof-Life-Log：用于自然音频流的音频环境检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, we develop a new system for real world audio environment matching. Environment detection within unknown audio streams requires a system that operates in an unsupervised manner since it will be faced with unknown environments without prior information. In addition, the overall solution should be computationally efficient for large audio collection. In the proposed approach, a Gaussian mixture model(GMM) is trained on large amounts of unlabeled audio data and used as a background acoustic model. Subsequently, an acoustic signature vector (ASV) is computed for each environment. Here, the ASV vector is designed to capture the unique acoustic characteristics of an environment. Using the ASV vectors, we demonstrate that it is possible to compute an effective similarity measure between two acoustic environments. We demonstrate the performance of the proposed system on real-world audio data, and compare it to a traditional GMM-UBM (Universal Background Model) system. Experiments show that our system achieves an equal error rate (EER) that is +35% better than a baseline GMM-UBM system.

机译：在这项研究中，我们开发了一种用于现实世界音频环境匹配的新系统。未知音频流中的环境检测要求系统以不受监督的方式运行，因为它将在没有先验信息的情况下面临未知环境。另外，对于大型音频收集，总体解决方案应在计算上有效。在提出的方法中，对大量未标记的音频数据进行训练的高斯混合模型（GMM）用作背景声学模型。随后，针对每个环境计算声学特征向量（ASV）。在此，ASV矢量旨在捕获环境的独特声学特性。使用ASV向量，我们证明可以计算两个声学环境之间的有效相似度。我们演示了所提出的系统在实际音频数据上的性能，并将其与传统的GMM-UBM（通用背景模型）系统进行了比较。实验表明，我们的系统实现的平均错误率（EER）比基准GMM-UBM系统高+ 35％。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|2513-2516|共4页
会议地点
作者
Ali Ziaei; Abhijeet Sangwan; John H.L. Hansen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Audio Environment Detection; Acoustic Signa-ture; Real word audio data; Prof-Life-Log;

机译：音频环境检测;声学签名;实词音频数据;终身教授;

相似文献

外文文献
中文文献
专利

1. Leveraging Frequency-Dependent Kernel and DIP-Based Clustering for Robust Speech Activity Detection in Naturalistic Audio Streams [J] . Harishchandra Dubey, Abhijeet Sangwan, John H. L. Hansen Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018,第11期

机译：利用基于频率的内核和基于DIP的聚类在自然音频流中进行健壮的语音活动检测
2. Audio-Facial Laughter Detection in Naturalistic Dyadic Conversations [J] . Bekir Berker Turker, Yucel Yemez, T. Metin Sezgin, Affective Computing, IEEE Transactions on . 2017,第4期

机译：自然二进对话中的音频笑声检测
3. Monitoring of audio visual quality by key indicators Detection of selected audio and audiovisual artefacts [J] . Blanco Fernandez Ignacio, Leszczuk Mikolaj Multimedia Tools and Applications . 2018,第2期

机译：通过关键指标监控视听质量检测选定的视听制品
4. Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams [C] . Ali Ziaei, Abhijeet Sangwan, John H.L. Hansen INTERSPEECH 2012 . 2012

机译：Life-log：自然音频流的音频环境检测
5. Prof-Life-Log: Speech and speaker advancements for massive naturalistic audio streams [D] . Ziaei, Ali 2015

机译：Prof-Life-Log：大量自然主义音频流的语音和扬声器改进
6. The phase of cortical oscillations determines the perceptual fate of visual cues in naturalistic audiovisual speech [O] . Raphaël Thézé, Anne-Lise Giraud, Pierre Mégevand 2020

机译：皮质振荡的阶段决定了自然化视听语言中视觉线索的感知命运
7. Toeplitz Inverse Covariance Based Robust Speaker Clustering for Naturalistic Audio Streams [O] . Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen 2019

机译：基于Toeplitz逆协方便的自然主义音频流的强大扬声器聚类

Prof-Life-Log: Audio Environment Detection for Naturalistic Audio Streams

摘要

著录项

相似文献

相关主题

期刊订阅