首页> 外文期刊>Neural computing & applications >New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification
【24h】

New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification

机译:深瓶颈提取器生成的新型转换功能和发言者年龄和性别分类的GMM-UBM分类器

获取原文
获取原文并翻译 | 示例
           

摘要

Speaker age and gender classification is one of the most challenging problems in speech signal processing. Recently with developing technologies, identifying speaker age and gender information has become a necessity for speaker verification and identification systems such as identifying suspects in criminal cases, improving human-machine interaction, and adapting music for awaiting people queue. Despite the intensive studies that have been conducted to extract descriptive and distinctive features, the classification accuracies are still not satisfactory. In this work, a model for generating bottleneck features from a deep neural network and a Gaussian Mixture Model-Universal Background Model (GMM-UBM) classifier are proposed for speaker age and gender classification problem. Deep neural network with a bottleneck layer is trained in an unsupervised manner for calculating the initial weights between layers. Then, it is trained and tuned in a supervised manner to generate transformed mel-frequency cepstral coefficients (T-MFCCs). The GMM-UBM is used to build a GMM model for each class, and the models are used to classify speaker age and gender. Age-annotated database of German telephone speech (aGender) is used to evaluate the proposed classification system. The newly generated T-MFCCs have shown potential to achieve significant classification improvements in speaker age and gender classification by using the GMM-UBM classifier. The proposed classification system achieved an overall accuracy of 57.63%. The highest accuracy is calculated as 72.97% for adult female speakers.
机译:演讲者年龄和性别分类是语音信号处理中最具挑战性的问题之一。最近,通过开发技术,识别扬声器年龄和性别信息已成为发言验证和识别系统,例如识别刑事案件中的嫌疑人,改善人工机器互动,以及适应等待人们队列的音乐。尽管已经进行了密集的研究,但已经进行了提取描述性和独特的特征,但分类准确性仍然不令人满意。在这项工作中,提出了一种用于从深神经网络和高斯混合模型 - 通用背景模型(GMM-UBM)分类器的瓶颈特征的模型,用于发言者年龄和性别分类问题。具有瓶颈层的深神经网络以无监督的方式培训,用于计算层之间的初始重量。然后,以监督方式训练并调整,以产生变换的熔体频率谱系数(T-MFCC)。 GMM-UBM用于为每个类构建GMM模型,模型用于对扬声器年龄和性别进行分类。德国电话语音(Agender)的年龄注释数据库用于评估所提出的分类系统。通过使用GMM-UBM分类器,新生成的T-MFCCS显示出达到扬声器年龄和性别分类的显着分类改进。拟议的分类系统实现了57.63%的整体准确性。最高准确性计算成年女性扬声器的72.97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号