首页> 美国卫生研究院文献>Springer Open Choice >New transformed features generated by deep bottleneck extractor and a GMM–UBM classifier for speaker age and gender classification
【2h】

New transformed features generated by deep bottleneck extractor and a GMM–UBM classifier for speaker age and gender classification

机译:由深瓶颈提取器和GMM–UBM分类器生成的新转换功能用于说话人年龄和性别分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Speaker age and gender classification is one of the most challenging problems in speech signal processing. Recently with developing technologies, identifying speaker age and gender information has become a necessity for speaker verification and identification systems such as identifying suspects in criminal cases, improving human–machine interaction, and adapting music for awaiting people queue. Despite the intensive studies that have been conducted to extract descriptive and distinctive features, the classification accuracies are still not satisfactory. In this work, a model for generating bottleneck features from a deep neural network and a Gaussian Mixture Model–Universal Background Model (GMM–UBM) classifier are proposed for speaker age and gender classification problem. Deep neural network with a bottleneck layer is trained in an unsupervised manner for calculating the initial weights between layers. Then, it is trained and tuned in a supervised manner to generate transformed mel-frequency cepstral coefficients (T-MFCCs). The GMM–UBM is used to build a GMM model for each class, and the models are used to classify speaker age and gender. Age-annotated database of German telephone speech (aGender) is used to evaluate the proposed classification system. The newly generated T-MFCCs have shown potential to achieve significant classification improvements in speaker age and gender classification by using the GMM–UBM classifier. The proposed classification system achieved an overall accuracy of 57.63%. The highest accuracy is calculated as 72.97% for adult female speakers.
机译:说话者的年龄和性别分类是语音信号处理中最具挑战性的问题之一。近年来,随着技术的发展,识别说话者的年龄和性别信息已成为说话者验证和识别系统的必要条件,例如识别刑事案件中的犯罪嫌疑人,改善人机交互以及使音乐适应排队等候。尽管已进行了大量研究以提取描述性和独特性,但分类准确性仍不令人满意。在这项工作中,针对说话者的年龄和性别分类问题,提出了一个用于从深层神经网络生成瓶颈特征的模型以及一个高斯混合模型-通用背景模型(GMM-UBM)分类器。具有瓶颈层的深度神经网络以无监督的方式进行训练,以计算层之间的初始权重。然后,以有监督的方式对其进行训练和调谐,以生成变换的梅尔频率倒谱系数(T-MFCC)。 GMM–UBM用于为每个课程建立GMM模型,并且该模型用于对说话者的年龄和性别进行分类。带有年龄注释的德国电话语音(aGender)数据库用于评估建议的分类系统。通过使用GMM-UBM分类器,新生成的T-MFCC已显示出在说话者年龄和性别分类方面实现显着分类改进的潜力。拟议的分类系统实现了57.63%的整体准确性。据计算,成年女性演讲者的最高准确性为72.97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号