首页> 外文会议>Odyssey 2010: the speaker and language recognition workshop >Online Diarization of Telephone Conversations
【24h】

Online Diarization of Telephone Conversations

机译:在线电话对话

获取原文
获取原文并翻译 | 示例

摘要

Speaker diarization systems attempts to perform segmentation and labeling of a conversation between R speakers, while no prior information is given regarding the conversation. Diarization systems basically tries to answer the question "Who spoke when?".rnIn order to perform speaker diarization, most state of the art diarization systems operate in an off-line mode, that is, all of the samples of the audio stream are required prior to the application of the diarization algorithm. Off-line diarization algorithms generally relies on a dendogram or hierarchical clustering approach.rnSeveral on-line diarization systems has been previously suggested, however, most require some prior information or off-line trained speaker and background models in order to conduct all or part of the diarization process.rnA new two-stage on-line diarization of telephone conversations algorithm is suggested in this study. On the first stage, a fully unsupervised diarization algorithm is applied over an initial training set of the conversation, this stage generates the speakers and non-speech models and tunes a hyper-state Hidden Markov Model (HMM) to be used on the second, on-line stage of diarization.rnOn-line diarization is then applied by means of time-series clustering using the Viterbi dynamic programming algorithm. Employing this approach provides diarization results a few mili-seconds following either a user request or once the conversation has concluded.rnIn order to evaluate diarization performance , the diarization system was applied over 2048, 5Min length, two-speaker conversations extracted from the NIST 2005 Speaker Recognition Evaluation.rnOn-line Diarization Error Rate (DER) is shown to approaches the "optimal" DER (achieved by applying unsupervised diarization over the entire conversation) as the length of the initial training set increases. Using an initial training set of 2Min and applying on-line diarization to the entire conversation incurred approximately 4% increase in DER compared to the "optimal" DER.
机译:说话人区分系统尝试对R个说话人之间的对话进行分段和标记,而没有给出有关该对话的先验信息。隔离系统基本上试图回答“谁在何时说话?”的问题。为了执行扬声器的隔离,大多数现有的隔离系统都以离线模式运行,也就是说,需要音频流的所有样本在应用diarization算法之前。离线数字化算法通常依赖于树状图或分层聚类方法。rn以前曾建议过几种在线数字化系统,但是,大多数都需要一些先验信息或离线训练的说话者和背景模型才能进行全部或部分操作。本研究提出了一种新的两阶段电话对话在线二值化算法。在第一阶段,将完全无监督的二值化算法应用于会话的初始训练集,该阶段会生成说话者和非语音模型,并调整要用于第二阶段的超状态隐马尔可夫模型(HMM),然后,使用维特比动态规划算法,通过时间序列聚类,对网络进行数字化处理。采用这种方法可以在用户请求后或对话结束后的几毫秒内提供对话结果。为了评估对话的效果,该对话系统应用了NIST 2005中提取的2048、5分钟长,两人对话说话人识别评估。随着初始训练集的长度增加,在线差异化错误率(DER)显示接近“最佳” DER(通过在整个会话中应用无监督的差异化来实现)。与“最佳” DER相比,使用2Min的初始训练集并对整个对话应用在线差异化会导致DER大约增加4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号