首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers
【24h】

BW-EDA-EEND: streaming END-TO-END Neural Speaker Diarization for a Variable Number of Speakers

机译:BW-EDA-eend:流媒体端到端的神经扬声器深度变量数量的扬声器

获取原文

摘要

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers. The system is based on the Encoder-Decoder-Attractor (EDA) architecture of Horiguchi et al., but utilizes the incremental Transformer encoder, attending only to its left contexts and using block-level recurrence in the hidden states to carry information from block to block, making the algorithm complexity linear in time. We propose two variants: For unlimited-latency BW-EDA-EEND, which processes inputs in linear time, we show only moderate degradation for up to two speakers using a context size of 10 seconds compared to offline EDA-EEND. With more than two speakers, the accuracy gap between online and offline grows, but the algorithm still outperforms a baseline offline clustering diarization system for one to four speakers with unlimited context size, and shows comparable accuracy with context size of 10 seconds. For limited-latency BW-EDA-EEND, which produces diarization outputs block-by-block as audio arrives, we show accuracy comparable to the offline clustering-based system.
机译:我们提出了一部小说的在线端到端神经深度化系统,BW-EDA-eend,用于逐步处理数据的数据。该系统基于Horiguchi等人的编码器 - 解码器 - 吸引子(EDA)架构。但是利用增量变压器编码器,仅参加其左上文,并在隐藏状态中使用块级复制来从块中携带信息。块,使算法随时间线性线性。我们提出了两个变体:对于无限延迟BW-EDA-eend,它在线性时间中的输入,我们仅使用10秒的上下文大小显示多达两个扬声器的中度劣化,而EDA-eend相比。拥有两个以上的扬声器,在线和离线之间的准确性差距增长,但算法仍然优于一个基线离线聚类日复日报系统,其中一到四个扬声器,具有无限的上下文大小,并显示了与上下文大小为10秒的可比准确度。对于Limity-延迟BW-EDA-Eend,它产生深度逐块作为音频到达的逐个块,我们显示了与基于离线聚类的系统相当的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号