...
首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Phone recognition with hierarchical convolutional deep maxout networks
【24h】

Phone recognition with hierarchical convolutional deep maxout networks

机译:具有分层卷积深度maxout网络的电话识别

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refinements to CNNs that have not been pursued by other authors. First, the CNN papers published up till now used sigmoid or rectified linear (ReLU) neurons. We will experiment with the maxout activation function proposed recently, which has been shown to outperform the rectifier activation function in fully connected DNNs. We will show that the pooling operation of CNNs and the maxout function are closely related, and so the two technologies can be readily combined to build convolutional maxout networks. Second, we propose to turn the CNN into a hierarchical model. The origins of this approach go back to the era of shallow nets, where the idea of stacking two networks on each other was relatively well known. We will extend this method by fusing the two networks into one joint deep model with many hidden layers and a special structure. We will show that with the hierarchical modelling approach, we can reduce the error rate of the network on an expanded context of input. In the experiments on the Texas Instruments Massachusetts Institute of Technology (TIMIT) phone recognition task, we find that a CNN built from maxout units yields a relative phone error rate reduction of about 4.3 % over ReLU CNNs. Applying the hierarchical modelling scheme to this CNN results in a further relative phone error rate reduction of 5.5 %. Using dropout training, the lowest error rate we get on TIMIT is 16.5 %, which is currently the best result. Besides experimenting on TIMIT, we also evaluate our best models on a low-resource large vocabulary task, and we find that all the proposed modelling improvements give consistently better results for this larger database as well. Keywords Deep neural network Convolutional neural network Maxout TIMIT
机译:深度卷积神经网络(CNN)最近在低资源和大规模语音任务上均优于完全连接的深度神经网络(DNN)。实验表明,在完全连接的深度网络上,卷积网络可以将大型词汇识别任务的单词错误率相对提高10-15%。在这里,我们探索了其他作者未曾追求的对CNN的一些改进。首先,迄今为止发表的CNN论文都使用了S型或整流线性(ReLU)神经元。我们将尝试使用最近提出的maxout激活函数,该函数在完全连接的DNN中表现出优于整流器激活函数。我们将证明CNN的合并操作与maxout函数密切相关,因此可以轻松地将这两种技术结合起来以构建卷积maxout网络。其次,我们建议将CNN转换为分层模型。这种方法的起源可以追溯到浅网时代,在该时代,相对较早地堆叠两个网络的想法非常普遍。我们将通过将两个网络融合为一个具有多个隐藏层和特殊结构的联合深度模型来扩展此方法。我们将展示使用分层建模方法,我们可以在扩展的输入上下文中降低网络的错误率。在德州仪器麻省理工学院(TIMIT)电话识别任务的实验中,我们发现,由maxout单元构建的CNN与ReLU CNN相比,可使电话相对错误率降低约4.3%。将分层建模方案应用于此CNN,可进一步将电话相对错误率降低5.5%。使用辍学训练,我们在TIMIT上获得的最低错误率是16.5%,这是目前最好的结果。除了对TIMIT进行试验外,我们还针对资源不足的大词汇量任务评估了我们的最佳模型,并且我们发现,所有建议的建模改进也都为该大型数据库提供了始终如一的更好结果。关键词深层神经网络卷积神经网络Maxout TIMIT

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号