Phone recognition with hierarchical convolutional deep maxout networks

László Tóth

首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Phone recognition with hierarchical convolutional deep maxout networks

【24h】

Phone recognition with hierarchical convolutional deep maxout networks

机译：具有分层卷积深度maxout网络的电话识别

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refinements to CNNs that have not been pursued by other authors. First, the CNN papers published up till now used sigmoid or rectified linear (ReLU) neurons. We will experiment with the maxout activation function proposed recently, which has been shown to outperform the rectifier activation function in fully connected DNNs. We will show that the pooling operation of CNNs and the maxout function are closely related, and so the two technologies can be readily combined to build convolutional maxout networks. Second, we propose to turn the CNN into a hierarchical model. The origins of this approach go back to the era of shallow nets, where the idea of stacking two networks on each other was relatively well known. We will extend this method by fusing the two networks into one joint deep model with many hidden layers and a special structure. We will show that with the hierarchical modelling approach, we can reduce the error rate of the network on an expanded context of input. In the experiments on the Texas Instruments Massachusetts Institute of Technology (TIMIT) phone recognition task, we find that a CNN built from maxout units yields a relative phone error rate reduction of about 4.3 % over ReLU CNNs. Applying the hierarchical modelling scheme to this CNN results in a further relative phone error rate reduction of 5.5 %. Using dropout training, the lowest error rate we get on TIMIT is 16.5 %, which is currently the best result. Besides experimenting on TIMIT, we also evaluate our best models on a low-resource large vocabulary task, and we find that all the proposed modelling improvements give consistently better results for this larger database as well. Keywords Deep neural network Convolutional neural network Maxout TIMIT

机译：深度卷积神经网络（CNN）最近在低资源和大规模语音任务上均优于完全连接的深度神经网络（DNN）。实验表明，在完全连接的深度网络上，卷积网络可以将大型词汇识别任务的单词错误率相对提高10-15％。在这里，我们探索了其他作者未曾追求的对CNN的一些改进。首先，迄今为止发表的CNN论文都使用了S型或整流线性（ReLU）神经元。我们将尝试使用最近提出的maxout激活函数，该函数在完全连接的DNN中表现出优于整流器激活函数。我们将证明CNN的合并操作与maxout函数密切相关，因此可以轻松地将这两种技术结合起来以构建卷积maxout网络。其次，我们建议将CNN转换为分层模型。这种方法的起源可以追溯到浅网时代，在该时代，相对较早地堆叠两个网络的想法非常普遍。我们将通过将两个网络融合为一个具有多个隐藏层和特殊结构的联合深度模型来扩展此方法。我们将展示使用分层建模方法，我们可以在扩展的输入上下文中降低网络的错误率。在德州仪器麻省理工学院（TIMIT）电话识别任务的实验中，我们发现，由maxout单元构建的CNN与ReLU CNN相比，可使电话相对错误率降低约4.3％。将分层建模方案应用于此CNN，可进一步将电话相对错误率降低5.5％。使用辍学训练，我们在TIMIT上获得的最低错误率是16.5％，这是目前最好的结果。除了对TIMIT进行试验外，我们还针对资源不足的大词汇量任务评估了我们的最佳模型，并且我们发现，所有建议的建模改进也都为该大型数据库提供了始终如一的更好结果。关键词深层神经网络卷积神经网络Maxout TIMIT

著录项

来源
《EURASIP journal on audio, speech, and music processing》 |2015年第1期|共13页
作者
László Tóth;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Facial Expression Recognition Using Hierarchical Features With Deep Comprehensive Multipatches Aggregation Convolutional Neural Networks [J] . Siyue Xie, Haifeng Hu Multimedia, IEEE Transactions on . 2019,第1期

机译：使用具有深度综合多补丁聚合卷积神经网络的分层特征的面部表情识别
2. Hierarchical committee of deep convolutional neural networks for robust facial expression recognition [J] . Kim Bo-Kyeong, Roh Jihyeon, Dong Suh-Yeon, Journal on multimodal user interfaces . 2016,第2期

机译：深度卷积神经网络的分层委员会，用于鲁棒的面部表情识别
3. ConvSRC: SmartPhone-based periocular recognition using deep convolutional neural network and sparsity augmented collaborative representation [J] . Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第3期

机译：ConvsRC：基于智能手机的周边识别，使用深卷积神经网络和稀疏增强协作表示
4. Maxout based deep neural networks for Arabic phonemes recognition [C] . AbdAlmisreb Ali, Abidin Ahmad Farid, Md Tahir Nooritawati IEEE International Colloquium on Signal Processing and its Applications . 2015

机译：基于Maxout的深度神经网络用于阿拉伯语音素识别
5. Hyperparameter Optimization of Deep Convolutional Neural Networks Architectures for Object Recognition [D] . Albelwi, Saleh. 2018

机译：深度卷积神经网络体系结构用于对象识别的超参数优化
6. Improving deep convolutional neural networks with mixed maxout units [O] . Hui-zhen Zhao, Fu-xian Liu, Long-yue Li 2011

机译：使用混合maxout单元改进深度卷积神经网络
7. Phone recognition with hierarchical convolutional deep maxout networks [O] . 2015

机译：具有分层卷积深度maxout网络的电话识别

Phone recognition with hierarchical convolutional deep maxout networks

摘要

著录项

相似文献

相关主题

期刊订阅