首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers

【24h】

BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers

机译：BW-EDA-eend：流媒体端到端的神经扬声器深度变量数量的扬声器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the Encoder-Decoder-Attractor (EDA) architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants: For unlimited-latency BW-EDA-EEND, which processes inputs in linear time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but the algorithm still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.

机译：我们提出了一部小说的在线端到端神经深度化系统，BW-EDA-eend，用于逐步处理数据的数据。该系统基于Horiguchi等人的编码器 - 解码器 - 吸引子（EDA）架构。但是利用增量变压器编码器，仅参加其左上文，并在隐藏状态中使用块级复制来从块中携带信息。块，使算法随时间线性线性。我们提出了两个变体：对于无限延迟BW-EDA-eend，它在线性时间中的输入，我们仅使用10秒的上下文大小显示多达两个扬声器的中度劣化，而EDA-eend相比。拥有两个以上的扬声器，在线和离线之间的准确性差距增长，但算法仍然优于一个基线离线聚类日复日报系统，其中一到四个扬声器，具有无限的上下文大小，并显示了与上下文大小为10秒的可比准确度。对于Limity-延迟BW-EDA-Eend，它产生深度逐块作为音频到达的逐个块，我们显示了与基于离线聚类的系统相当的准确性。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing 》|2021年|7193-7197|共5页
会议地点
作者
Eunjung Han; Chul Lee; Andreas Stolcke;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Degradation; Training; Signal processing algorithms; Clustering algorithms; Training data; Telephone sets; Speech;

机译：退化;训练;信号处理算法;聚类算法;培训数据;电话套装;言语;

相似文献

外文文献
中文文献
专利

1. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019 ,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
2. Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information [J] . Ishiguro K., Yamada T., Araki S., Audio, Speech, and Language Processing, IEEE Transactions on . 2012 ,第2期

机译：说话者角度信息的词袋表示概率的说话人区分
3. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008 ,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
4. End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings [C] . Soumi Maiti, Hakan Erdogan, Kevin Wilson, IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：具有本地全球网络和鉴别扬声器嵌入的可变数量扬声器的端到端日复速
5. Automatic Speaker Recognition and Diarization in Co-Channel Speech [D] . Shokouhi, Navid. 2017

机译：同频道语音中的说话人自动识别和区分
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors [O] . Shota Horiguchi, Yusuke Fujita, Shinji Watanabe, 2020

机译：用于基于编码器解码器的扬声器数量未知数量的扬声器的端到端扬声器深度
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers

摘要

著录项

相似文献

相关主题

期刊订阅