Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

Parthasarathi S. H. K.; Bourlard H.; Gatica-Perez D.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

【24h】

Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

机译：无言的声音：使用保护隐私的音频表示实现鲁棒的扬声器分离

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper investigates robust privacy-sensitive audio features for speaker diarization in multiparty conversations: i.e., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenarios. We systematically investigate Linear Prediction (LP) residual. Issues such as prediction order and choice of representation of LP residual are studied. Additionally, we explore the combination of LP residual with subband information from 2.5 kHz to 3.5 kHz and spectral slope. Next, we propose a supervised framework using deep neural architecture for deriving privacy-sensitive audio features. We benchmark these approaches against the traditional Mel Frequency Cepstral Coefficients (MFCC) features for speaker diarization in both the microphone scenarios. Experiments on the RT07 evaluation dataset show that the proposed approaches yield diarization performance close to the MFCC features on the single distant microphone dataset. To objectively evaluate the notion of privacy in terms of linguistic information, we perform human and automatic speech recognition tests, showing that the proposed approaches to privacy-sensitive audio features yield much lower recognition accuracies compared to MFCC features.

机译：本文研究了健壮的隐私敏感音频功能，用于多方对话中的说话人差异化：即，一组音频功能具有低语言信息，用于在单个和多个远距离麦克风场景中进行说话人差异化。我们系统地研究线性预测（LP）残差。研究了预测顺序和LP残差表示的选择等问题。此外，我们探索了LP残差与2.5 kHz至3.5 kHz的子带信息以及频谱斜率的组合。接下来，我们提出一种使用深度神经网络架构的受监督框架，以导出对隐私敏感的音频功能。我们将这两种方法与传统的“梅尔频率倒谱系数”（MFCC）功能进行了基准测试，以在两种麦克风场景中实现扬声器的二值化。在RT07评估数据集上进行的实验表明，所提出的方法产生的分离性能接近于单个远距离麦克风数据集上的MFCC特征。为了客观地评估语言信息方面的隐私概念，我们执行了人工和自动语音识别测试，表明与MFCC功能相比，针对隐私敏感的音频功能的拟议方法产生的识别精度要低得多。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第1期|p.83-96|共14页
作者
Parthasarathi S. H. K.; Bourlard H.; Gatica-Perez D.;
展开▼
作者单位

International Computer Science Institute, Berkeley, CA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
LP residual; Privacy sensitive audio features; deep neural networks; listening tests; speaker diarization;

机译：LP残差;隐私敏感的音频功能;深层神经网络;听觉测试;扬声器偏音;

相似文献

外文文献
中文文献
专利

1. Survey Of Privacy-Preserving Audio Representations With Speaker Diarization [J] . S.Sathyapriya M.phil, A.Indhumathi International Journal of Computer Trends and Technology . 2013,第9期

机译：演讲者区分的隐私保护音频表示调查
2. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
3. Development Of A Speaker Diarization System For Speaker Tracking In Audio Broadcast News: A Case Study [J] . Janez Zibert, Bostjan Vesnicer, France Mihelic Journal of Computing and Information Technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者差异化系统的开发：一个案例研究
4. Audio-Video Speaker Diarization for Unsupervised Speaker and Face Model Creation [C] . Pavel Campr, Marie Kunesova, Jan Vanek, International conference on text, speech and dialogue . 2014

机译：音频-视频扬声器的二值化，可实现无监督的扬声器和面部模型创建
5. Robust Speaker Modeling in Non-Neutral Environments with Application to Large Scale Multi-Speaker Audio Streams [D] . Yu, Chengzhu. 2017

机译：非中性环境中的鲁棒扬声器建模及其在大规模多扬声器音频流中的应用
6. Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model [O] . Rehan Ahmad, Syed Zubair, Hani Alquhayz, 2019

机译：使用预训练的视听同步模型进行多模态扬声器二分法
7. Wordless Sounds: Robust Speaker Diarization using Privacy-Preserving Audio Representations [O] . Sree Hari, Krishnan Parthasarathi, Student Member, 2015

机译：无言的声音：使用隐私保护音频表示的强大扬声器二值化
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅