首页> 外文期刊>Expert Systems with Application >Age and gender classification from speech and face images by jointly fine-tuned deep neural networks
【24h】

Age and gender classification from speech and face images by jointly fine-tuned deep neural networks

机译:通过联合微调的深度神经网络从语音和面部图像中进行年龄和性别分类

获取原文
获取原文并翻译 | 示例

摘要

The classification of human's age and gender from speech and face images is a challenging task that has important applications in real-life and its applications are expected to grow more in the future. Deep neural networks (DNNs) and Convolutional neural networks (CNNs) are considered as one of the state-of-art systems as feature extractors and classifiers and are proven to be very efficient in analyzing problems with complex feature space. In this work, we propose a new cost function for fine-tuning two DNNs jointly. The proposed cost function is evaluated by using speech utterances and unconstrained face images for age and gender classification task. The proposed classifier design consists of two DNNs trained on different feature sets, which are extracted from "the same input data. Mel-frequency cepstral coefficients (MFCCs) and fundamental frequency (F0) and the shifted delta cepstral coefficients (SDC) are extracted from speech as the first and second feature sets, respectively. Facial appearance and the depth information are extracted from face images as the first and second feature sets, respectively. Jointly training of two DNNs with the proposed cost function improved the classification accuracies and minimized the over-fitting effect for both speech-based and image-based systems. Extensive experiments have been conducted to evaluate the performance and the accuracy of the proposed work. Two publicly available databases, the Age-Annotated Database of the German Telephone Speech database (aGender) and the Adience database, are used to evaluate the proposed system. The overall accuracy of the proposed system is calculated as 56.06% for seven speaker classes and overall exact accuracy is calculated as 63.78% for Adience database. (C) 2017 Elsevier Ltd. All rights reserved.
机译:从语音和面部图像对人类的年龄和性别进行分类是一项具有挑战性的任务,在现实生活中具有重要的应用,并且其应用有望在未来增长。深度神经网络(DNN)和卷积神经网络(CNN)被认为是作为特征提取器和分类器的最新系统之一,并被证明在分析复杂特征空间问题方面非常有效。在这项工作中,我们提出了一个新的成本函数,用于共同微调两个DNN。通过针对年龄和性别分类任务,使用语音发声和不受约束的面部图像来评估建议的成本函数。拟议的分类器设计包括从“相同的输入数据”中提取的,在不同特征集上训练的两个DNN。从中提取梅尔频率倒谱系数(MFCC)和基频(F0),以及从中提取移位后的倒谱倒谱系数(SDC)分别以语音作为第一和第二特征集,分别从面部图像中提取脸部外观和深度信息作为第一和第二特征集,并结合提出的成本函数对两个DNN进行训练,从而提高了分类精度,并最大程度地减少了基于语音的系统和基于图像的系统的拟合效果。已经进行了广泛的实验以评估所提出的工作的性能和准确性。两个公开可用的数据库,德国电话语音数据库的年龄注释数据库(aGender)和Adience数据库用于评估所提出的系统,所提出系统的整体准确度经计算为56 Adience数据库的七个说话者类别为0.06%,总的准确度为63.78%。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号