首页> 外文OA文献 >Neural network based feature extraction for speech and image recognition
【2h】

Neural network based feature extraction for speech and image recognition

机译:基于神经网络的语音和图像识别特征提取

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This work investigates features derived from an artificial neural network. These artificial neural network based probabilistic features have become a major component of current state-of-the-art systems for automatic speech recognition and other areas, e.g. image recognition. A detailed study of the artificial neural network based features helps to improve the feature extraction and to understand which information of the speech signal is relevant for recognition. Two algorithms are investigated which are widely used to integrate the information derived from an artificial neural network: the tandem and the hybrid approach. This work studies the effect of each of the algorithms in terms of recognition performance w.r.t. word error rate and the computational requirements. In addition, a detailed comparison and a discussion of the main advantages of each integration approach are given. Furthermore, novel extensions are proposed improving the artificial neural network feature extraction and the final recognition performance of the systems trained. These extensions concern the input features and the topology of the network used to train the artificial neural network and are independent of the integration method. Different short-term and long-term features model other complementary aspects of the speech signal. By combining these different feature sets the development circle of the speech recognition system can be simplified. This allows increasing the model complexity of the artificial neural network or of the acoustic model. The topology of an artificial neural network has a huge impact on the quality of the features derived from the artificial neural network. This work investigates the hierarchical framework, the bottle-neck processing and recurrent neural networks, especially the long-short-term-memory structure and the training of bi-directional networks. Furthermore, this work examines cross-lingual artificial neural network features and their impact on the topology and the amount of audio data used to train such features. The training and testing language of the artificial neural network features differs and the system development circle is simplified when such cross-lingual artificial neural network based features are used. In addition, this work analyses different supervised and unsupervised weight pre-training techniques. The initialization of the weights of a deep neural network is critical since the optimization function is non-convex. A new unsupervised pre-training technique is developed which allows the optimization of the loss function directly and provides a clear stopping criterion compared to other pre-training techniques like Restricted Boltzmann Machines. Finally, this work analyzes the generality of the artificial neural network based feature extraction approach by transferring the concept to different image tasks, optical character recognition and automatic sign language recognition. While most results are confirmed, some surprising new results are obtained.
机译:这项工作调查了从人工神经网络派生的特征。这些基于人工神经网络的概率特征已经成为用于自动语音识别和其他领域(例如语音识别)的最新技术的主要组成部分。图像识别。基于人工神经网络的特征的详细研究有助于改善特征提取,并了解语音信号的哪些信息与识别相关。研究了两种算法,它们被广泛用于集成来自人工神经网络的信息:串联和混合方法。这项工作研究了每种算法在识别性能方面的影响。字错误率和计算要求。此外,还对每种集成方法的主要优点进行了详细的比较和讨论。此外,提出了新的扩展,以改善人工神经网络的特征提取和训练系统的最终识别性能。这些扩展涉及用于训练人工神经网络的输入特征和网络拓扑,并且独立于集成方法。不同的短期和长期特征为语音信号的其他互补方面建模。通过组合这些不同的功能集,可以简化语音识别系统的开发范围。这允许增加人工神经网络或声学模型的模型复杂度。人工神经网络的拓扑结构对从人工神经网络派生的特征的质量产生巨大影响。这项工作研究了层次结构,瓶颈处理和递归神经网络,尤其是长短期内存结构以及双向网络的训练。此外,这项工作研究了跨语言的人工神经网络功能及其对拓扑的影响以及用于训练此类功能的音频数据量。人工神经网络特征的训练和测试语言不同,并且使用基于这种跨语言人工神经网络的特征简化了系统开发圈。此外,这项工作分析了不同的监督和无监督的体重预训练技术。由于优化功能是非凸的,因此深度神经网络权重的初始化至关重要。研发了一种新的无监督预训练技术,该技术可以直接优化损失函数,并且与其他预训练技术(如受限玻尔兹曼机)相比,提供了明确的停止标准。最后,这项工作通过将概念转移到不同的图像任务,光学字符识别和自动手语识别上,分析了基于人工神经网络的特征提取方法的一般性。虽然大多数结果得到证实,但仍获得了一些令人惊讶的新结果。

著录项

  • 作者

    Plahl Christian;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号