...
首页> 外文期刊>ETRI journal >Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model
【24h】

Rank‐weighted reconstruction feature for a robust deep neural network‐based acoustic model

机译:基于鲁棒深度神经网络的声学模型的秩加权重建功能

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In this paper, we propose a rank‐weighted reconstruction feature to improve the robustness of a feed‐forward deep neural network ( FFDNN )‐based acoustic model. In the FFDNN ‐based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank‐weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal ( WSJ ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.
机译:在本文中,我们提出了一种秩加权重构功能,以提高基于前馈深度神经网络(FFDNN)的声学模型的鲁棒性。在基于FFDNN的声学模型中,通过对子矩阵进行矢量化来构造输入特征,该子矩阵是通过在上下文窗口中对帧的特征向量进行切片而创建的。在这种类型的特征构造中,适当的上下文窗口大小很重要,因为它确定了琐碎或有区别的信息(例如冗余或输入特征的时间上下文)的数量。但是,我们确定了单个参数是否足以控制信息量。因此,我们从等级和无效性的角度研究了输入特征的构造,并在此提出了一种等级加权的重构特征,该特征允许保留语音信息分量并减少琐碎分量。在TIMIT电话识别和《华尔街日报》(WSJ)域中对提出的方法进行了评估。该方法将TIMIT域的电话错误率从18.4%降低到18.0%,并将WSJ域的字错误率从4.70%降低到4.43%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号