首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Active Learning Based Constrained Clustering For Speaker Diarization
【24h】

Active Learning Based Constrained Clustering For Speaker Diarization

机译:基于主动学习的约束聚类用于说话人区分

获取原文
获取原文并翻译 | 示例

摘要

Most speaker diarization research has focused on unsupervised scenarios, where no human supervision is available. However, in many real-world applications, a certain amount of human input could be expected, especially when minimal human supervision brings significant performance improvement. In this study, we propose an active learning based bottom-up speaker clustering algorithm to effectively improve speaker diarization performance with limited human input. Specifically, the proposed active learning based speaker clustering has two different stages: explore and constrained clustering. The explore stage is to quickly discover at least one sample for each speaker for boosting speaker clustering process with reliable initial speaker clusters. After discovering all, or a majority, of the involved speakers during explore stage, the constrained clustering is performed. Constrained clustering is similar to traditional bottom-up clustering process with an important difference that the clusters created during explore stage are restricted from merging with each other. Constrained clustering continues until only the clusters generated from the explore stage are left. Since the objective of active learning based speaker clustering algorithm is to provide good initial speaker models, performance saturates as soon as sufficient examples are ensured for each cluster. To further improve diarization performance with increasing human input, we propose a second method which actively select speech segments that account for the largest expected speaker error from existing cluster assignments for human evaluation and reassignment. The algorithms are evaluated on our recently created Apollo Mission Control Center dataset as well as augmented multiparty interaction meeting corpus. The results indicate that the proposed active learning algorithms are able to reduce diarization error rate significantly with a relatively small amount of human supervision.
机译:大多数说话人歧视研究都集中在无人监督的情况下,没有人为监督。但是,在许多实际应用中,可能需要一定量的人工输入,尤其是在最小限度的人工监督带来显着性能改进的情况下。在这项研究中,我们提出了一种基于主动学习的自下而上的说话人聚类算法,可以有效地提高有限的人工输入条件下的说话人区分性能。具体而言,提出的基于主动学习的说话人聚类具有两个不同的阶段:探索聚类和约束聚类。探索阶段是为每个说话者快速发现至少一个样本,以通过可靠的初始说话者聚类来增强说话者聚类过程。在探索阶段发现所有或大部分参与说话的人之后,进行约束聚类。约束聚类与传统的自下而上的聚类过程相似,但有一个重要的区别,就是在探索阶段创建的聚类被限制为不能相互合并。约束聚类持续进行,直到仅剩下从探索阶段生成的聚类为止。由于基于主动学习的说话者聚类算法的目的是提供良好的初始说话者模型,因此,只要为每个聚类确保了足够的示例,性能就会饱和。为了在增加人工输入的情况下进一步提高区分性能,我们提出了第二种方法,该方法可以从现有的群集分配中主动选择占最大预期说话者错误的语音段,以进行人工评估和重新分配。我们最近创建的Apollo Mission Control Center数据集以及增强的多方交互会议语料库对算法进行了评估。结果表明,所提出的主动学习算法能够在相对较少的人工监督下,显着降低差分误差率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号