首页> 外文学位 >Robust signal processing methods for miniature acoustic sensing, separation, and recognition.
【24h】

Robust signal processing methods for miniature acoustic sensing, separation, and recognition.

机译:用于微型声感测,分离和识别的强大信号处理方法。

获取原文
获取原文并翻译 | 示例

摘要

One of several emerging areas where micro-scale integration promises significant breakthroughs is in the field of acoustic sensing. However, separation, localization, and recognition of acoustic sources using micro-scale microphone arrays poses a significant challenge due to fundamental limitations imposed by the physics of sound propagation. The smaller the distance between the recording elements, the more difficult it is to measure localization and separation cues and hence it is more difficult to recognize the acoustic sources of interest. The objective of this research is to investigate signal processing and machine learning techniques that can be used for noise-robust acoustic target recognition using miniature microphone arrays.;The first part of this research focuses on designing "smart" analog-to-digital conversion (ADC) algorithms that can enhance acoustic cues in sub-wavelength microphone arrays. Many source separation algorithms fail to deliver robust performance when applied to signals recorded using high-density sensor arrays where the distance between sensor elements is much less than the wavelength of the signals. This can be attributed to limited dynamic range (determined by analog-to-digital conversion) of the sensor which is insufficient to overcome the artifacts due to large cross-channel redundancy, non-homogeneous mixing and high-dimensionality of the signal space. We propose a novel framework that overcomes these limitations by integrating statistical learning directly with the signal measurement (analog-to-digital) process which enables high fidelity separation of linear instantaneous mixture. At the core of the proposed ADC approach is a min-max optimization of a regularized objective function that yields a sequence of quantized parameters which asymptotically tracks the statistics of the input signal. Experiments with synthetic and real recordings demonstrate consistent performance improvements when the proposed approach is used as the analog-to-digital front-end to conventional source separation algorithms.;The second part of this research focuses on investigating a novel speech feature extraction algorithm that can recognize auditory targets (keywords and speakers) using noisy recordings. The features known as Sparse Auditory Reproducing Kernel (SPARK) coefficients are extracted under the hypothesis that the noise-robust information in speech signal is embedded in a subspace spanned by sparse, regularized, over-complete, non-linear, and phase-shifted gammatone basis functions. The feature extraction algorithm involves computing kernel functions between the speech data and pre-computed set of phased-shifted gammatone functions, followed by a simple pooling technique ("MAX" operation). In this work, we present experimental results for a hidden Markov model (HMM) based speech recognition system whose performance has been evaluated on a standard AURORA 2 dataset. The results demonstrate that the SPARK features deliver significant and consistent improvements in recognition accuracy over the standard ETSI STQ WI007 DSR benchmark features. We have also verified the noise-robustness of the SPARK features for the task of speaker verification. Experimental results based on the NIST SRE 2003 dataset show significant improvements when compared to a standard Mel-frequency cepstral coefficients (MFCCs) based benchmark.
机译:微型集成有望带来重大突破的几个新兴领域之一是声学传感领域。但是,由于声音传播的物理原理带来的基本限制,使用微型麦克风阵列对声源进行分离,定位和识别带来了巨大的挑战。记录元件之间的距离越小,定位和分离线索的测量就越困难,因此,识别感兴趣的声源就越困难。这项研究的目的是研究可用于使用微型麦克风阵列进行噪声稳健的声学目标识别的信号处理和机器学习技术。该研究的第一部分着重于设计“智能”模数转换( ADC)算法,可以增强亚波长麦克风阵列中的声音提示。当应用于使用高密度传感器阵列记录的信号时,许多源分离算法无法提供强大的性能,在高密度传感器阵列中,传感器元件之间的距离远小于信号的波长。这可以归因于传感器的有限动态范围(由模数转换确定),该动态范围不足以克服由于较大的跨通道冗余,非均匀混合和信号空间的高维度而导致的伪影。我们提出了一种新颖的框架,该框架通过将统计学习直接与信号测量(模拟到数字)过程集成在一起,从而克服了这些限制,从而可以对线性瞬时混合物进行高保真度分离。所提出的ADC方法的核心是对正则化目标函数的最小-最大优化,该优化会产生一系列量化参数,渐近跟踪输入信号的统计信息。使用合成和真实录音进行的实验表明,当将所提方法用作常规信源分离算法的模数前端时,性能会得到持续改善。本研究的第二部分重点研究了一种新颖的语音特征提取算法,该算法可以使用嘈杂的录音识别听觉目标(关键字和说话者)。在语音信号中的噪声鲁棒信息被嵌入稀疏,正则化,过完全,非线性和相移的伽马通跨越的子空间中的假设下,提取了称为稀疏听觉再现核(SPARK)系数的特征。基本功能。特征提取算法涉及计算语音数据和相移后的伽马通函数的预先计算的集合之间的内核函数,然后是简单的合并技术(“ MAX”操作)。在这项工作中,我们介绍了基于隐马尔可夫模型(HMM)的语音识别系统的实验结果,该系统的性能已在标准AURORA 2数据集上进行了评估。结果表明,与标准ETSI STQ WI007 DSR基准测试功能相比,SPARK功能在识别准确度方面提供了显着且一致的改进。我们还验证了SPARK功能的抗噪性,以进行说话人验证。与基于标准梅尔频率倒谱系数(MFCC)的基准相比,基于NIST SRE 2003数据集的实验结果显示出显着的改进。

著录项

  • 作者

    Fazel, Amin.;

  • 作者单位

    Michigan State University.;

  • 授予单位 Michigan State University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 138 p.
  • 总页数 138
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号