首页> 外文OA文献 >Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system
【2h】

Short- and long-term speech features for hybrid HMM-i-vector based speaker diarization system

机译:基于混合Hmm-i矢量的扬声器二值化系统的短期和长期语音特征

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

i-vectors have been successfully applied over the last years in speaker recognition tasks. This work aims at assessing the suitability of i-vector modeling within the frame of speaker diarization task. In such context, a weighted cosine-distance between two different sets of i-vectors is proposed for speaker clustering. Speech clusters generated by Viterbi segmentation are first modeled by two different i-vectors. Whilst the first i-vector represents the distribution of the commonly used short-term Mel Frequency Cepstral Coefficients, the second one depicts a selection of voice quality and prosodic features. In order to combine both short- and long-term speech statistics, the cosine-distance scores of those two i-vectors are linearly weighted to obtain a unique similarity score. The final fused score is then used as speaker clustering distance. Our experimental results on two different evaluation sets of the Augmented Multi-party Interaction corpus show the suitability of combining both sources of information within the i-vector space. Our experimental results show that the use of i-vector based clustering technique provide a significant improvement, in terms of diarization error rate, than those based on Gaussian Mixture Modeling technique. Furthermore, this work also reports a significant speaker error reduction by augmenting short-term based i-vector clustering with a second i-vector estimated from voice quality and prosody related speech features.
机译:过去几年中,i向量已成功应用于说话人识别任务中。这项工作旨在评估在说话人区分任务框架内进行i-vector建模的适用性。在这种情况下,提出了两个不同的i向量集之间的加权余弦距离用于说话人聚类。由维特比分割生成的语音簇首先由两个不同的i-vector建模。第一个i向量代表常用的短期梅尔频率倒谱系数的分布,第二个i向量描述了语音质量和韵律特征的选择。为了结合短期和长期语音统计,对这两个i向量的余弦距离得分进行线性加权以获得唯一的相似性得分。然后将最终的融合分数用作说话者聚类距离。我们在增强多方互动语料库的两个不同评估集上的实验结果表明,在i向量空间内组合两种信息源的适用性。我们的实验结果表明,基于i-vector的聚类技术比基于高斯混合模型技术的聚类技术在误码率方面有了显着改善。此外,这项工作还报告了通过使用基于语音质量和与韵律相关的语音特征估计的第二个i向量增强基于短期的i向量聚类来显着减少说话者错误的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号