首页> 外文会议>IEEE International Conference on Rebooting Computing >Speaker Differentiation Using a Convolutional Autoencoder
【24h】

Speaker Differentiation Using a Convolutional Autoencoder

机译:使用卷积自动编码器的说话人区分

获取原文

摘要

In this work, a deep learning solution for differentiating speaker voices in audio given two microphone sources is presented as a step towards solving the cocktail party problem. A convolutional autoencoder was trained using a small sample size of data to associate audio snippets with categorical labels. Audio snippets collected as part of this work were used for training and evaluating the model. Audio was converted to mel-frequency cepstrum representation prior to classification. The collective processed data was labeled according to the person or collection of persons speaking. The model was trained and evaluated using data with two, three, four, five, and six categories. The result was a model that recognizes when different people are speaking in a 2-person, 3-person, 4-person, 5-person, and 6-person conversation with an accuracy of 99.29%, 97.62%, 96.43%, 93.43%, and 88.1%, respectively. Experimental comparisons between the five versions of the model are presented.
机译:在这项工作中,提出了一种深度学习解决方案,用于区分给定两个麦克风源的音频中的扬声器语音,以此作为解决鸡尾酒会问题的步骤。使用少量数据样本对卷积自动编码器进行了训练,以将音频片段与分类标签相关联。作为这项工作的一部分收集的音频片段被用于训练和评估模型。在分类之前,将音频转换为梅尔频率倒谱表示。根据说话的人或人的集合对处理过的集体数据进行标记。使用两个,三个,四个,五个和六个类别的数据对模型进行了训练和评估。结果是一个模型,该模型可以识别不同的人何时在2人,3人,4人,5人和6人对话中讲话,准确度为99.29 \%,97.62 \%,96.43 \% ,分别为93.43 \%和88.1 \%。给出了模型的五个版本之间的实验比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号