Online Diarization of Telephone Conversations

机译：在线电话对话

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker diarization systems attempts to perform segmentation and labeling of a conversation between R speakers, while no prior information is given regarding the conversation. Diarization systems basically tries to answer the question "Who spoke when?".rnIn order to perform speaker diarization, most state of the art diarization systems operate in an off-line mode, that is, all of the samples of the audio stream are required prior to the application of the diarization algorithm. Off-line diarization algorithms generally relies on a dendogram or hierarchical clustering approach.rnSeveral on-line diarization systems has been previously suggested, however, most require some prior information or off-line trained speaker and background models in order to conduct all or part of the diarization process.rnA new two-stage on-line diarization of telephone conversations algorithm is suggested in this study. On the first stage, a fully unsupervised diarization algorithm is applied over an initial training set of the conversation, this stage generates the speakers and non-speech models and tunes a hyper-state Hidden Markov Model (HMM) to be used on the second, on-line stage of diarization.rnOn-line diarization is then applied by means of time-series clustering using the Viterbi dynamic programming algorithm. Employing this approach provides diarization results a few mili-seconds following either a user request or once the conversation has concluded.rnIn order to evaluate diarization performance , the diarization system was applied over 2048, 5Min length, two-speaker conversations extracted from the NIST 2005 Speaker Recognition Evaluation.rnOn-line Diarization Error Rate (DER) is shown to approaches the "optimal" DER (achieved by applying unsupervised diarization over the entire conversation) as the length of the initial training set increases. Using an initial training set of 2Min and applying on-line diarization to the entire conversation incurred approximately 4% increase in DER compared to the "optimal" DER.

机译：说话人区分系统尝试对R个说话人之间的对话进行分段和标记，而没有给出有关该对话的先验信息。隔离系统基本上试图回答“谁在何时说话？”的问题。为了执行扬声器的隔离，大多数现有的隔离系统都以离线模式运行，也就是说，需要音频流的所有样本在应用diarization算法之前。离线数字化算法通常依赖于树状图或分层聚类方法。rn以前曾建议过几种在线数字化系统，但是，大多数都需要一些先验信息或离线训练的说话者和背景模型才能进行全部或部分操作。本研究提出了一种新的两阶段电话对话在线二值化算法。在第一阶段，将完全无监督的二值化算法应用于会话的初始训练集，该阶段会生成说话者和非语音模型，并调整要用于第二阶段的超状态隐马尔可夫模型（HMM），然后，使用维特比动态规划算法，通过时间序列聚类，对网络进行数字化处理。采用这种方法可以在用户请求后或对话结束后的几毫秒内提供对话结果。为了评估对话的效果，该对话系统应用了NIST 2005中提取的2048、5分钟长，两人对话说话人识别评估。随着初始训练集的长度增加，在线差异化错误率（DER）显示接近“最佳” DER（通过在整个会话中应用无监督的差异化来实现）。与“最佳” DER相比，使用2Min的初始训练集并对整个对话应用在线差异化会导致DER大约增加4％。

著录项

来源
《Odyssey 2010: the speaker and language recognition workshop》|2010年|p.133-138|共6页
会议地点 Brno(CS)
作者
Oshry Ben-Harush; Itshak Lapidot; Hugo Guterman;
展开▼
作者单位

Department of Electrical and Computers Engineering Ben-Gurion University of the Negev, Beer-Sheva, Israel;

Department of Electrical and Electronics Engineering Sami Shamoon College of Engineering, Ashdod, Israel;

rnDepartment of Electrical and Computers Engineering Ben-Gurion University of the Negev, Beer-Sheva, Israel;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类语音信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations [J] . Ben-Harush O., Ben-Harush O., Lapidot I., Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第2期

机译：电话会议基于迭代的说话人区分系统的初始化
2. Diarization of Telephone Conversations Using Factor Analysis [J] . Kenny P., Reynolds D., Castaldo F. Selected Topics in Signal Processing, IEEE Journal of . 2010,第6期

机译：使用因素分析对电话对话进行数字化
3. Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations [J] . Gupta V., Kenny P., Ouellet P., IEEE signal processing letters . 2007,第12期

机译：结合高斯化/非高斯化功能以改善电话对话中的说话人差异化
4. Full-posterior PLDA based speaker diarization of telephone conversations [C] . Yanni Chen, Yonghong Yan, Wei Hong, 2017 First International Conference on Electronics Instrumentation amp; Information Systems . 2017

机译：基于全后PLDA的说话人对电话对话的区分
5. Robust voice mining techniques for telephone conversations. [D] . Manocha, Sandeep. 2006

机译：用于电话对话的可靠语音挖掘技术。
6. Inferring Social Nature of Conversations from Words: Experiments on a Corpus of Everyday Telephone Conversations [O] . Anthony Stark, Izhak Shafran, Jeffrey Kaye -1

机译：从文字中推断对话的社会性质：日常电话对话语料库的实验
7. PLDA-Based Diarization of Telephone Conversations [O] . Bulut, Ahmet E., Demir, Hakan, Isik, Yusuf Ziya, 2017

机译：基于pLDa的电话对话二元化
8. MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations [R] . Reynolds, D. A., Torres-Carrasquillo, P. 2004

机译：麻省理工学院林肯实验室RT-04F Diarization systems：广播音频和电话对话的应用

Online Diarization of Telephone Conversations

摘要

著录项

相似文献

相关主题

期刊订阅