首页> 外文学位 >Biologically inspired auditory attention models with applications in speech and audio processing.
【24h】

Biologically inspired auditory attention models with applications in speech and audio processing.

机译:受生物启发的听觉注意力模型及其在语音和音频处理中的应用。

获取原文
获取原文并翻译 | 示例

摘要

Humans can precisely process and interpret complex scenes in real time despite the tremendous amount of stimuli impinging the senses and the limited resources of the nervous system. One of the key enablers of this capability is a neural mechanism called "attention". The focus of this dissertation is to develop computational algorithms that emulate human auditory attention and to demonstrate their effectiveness in spoken language and audio processing applications.;Attention allows primates to efficiently allocate their neural resources to the locations of interest to precisely interpret a scene or to search for a target. In a scene, some stimuli are inherently salient within the context, and they attract attention in a bottom-up manner. Saliency-driven attention is a rapid, bottom-up, task-independent process, and it detects the objects that perceptually pop out of a scene by significantly differing from their neighbors. The second form of attention is a top-down task-dependent process which uses prior knowledge and learned past experience to focus attention on the target locations in a scene to enhance the processing.;One of the primary contributions of this thesis work is the development of a novel bottom-up auditory attention model. An auditory saliency map is proposed to model such saliency-driven bottom-up auditory attention. The feature extraction structure of the attention model is inspired by the processing stages in the human auditory system. It has been demonstrated with the experiments that the bottom-up auditory attention model can successfully detect prominent syllables and words in speech. In addition, the bottom-up auditory attention model is used to detect salient acoustic events in complex acoustic scenes. It has been shown that using only the selected salient events for acoustic scene classification performs better than the conventional audio content processing algorithms, which process the whole signal fully and treat everything as equally important.;The next contribution of this thesis work is an analysis of the effect of task-dependent in uences on auditory attention. For this, a biologically plausible top-down model is proposed in this thesis. The top-down attention model shares the same front-end with the bottom-up auditory attention model and biases the features to mimic the task in uences on neurons. In addition to the acoustic cues, the in uence of higher level task-dependent cues such as lexical and syntactic information is also incorporated into the model. The combined model achieves the highest performance on prominent syllable/word detection tasks indicating the importance of a priori task information.;Finally, an attention shift decoding method inspired by human speech recognition is proposed in this dissertation. In contrast to the traditional automatic speech recognition systems, which decode speech fully and consecutively from left-to-right, the attention shift decoding method decodes speech inconsecutively using reliability criteria. To detect reliable regions of speech, a new set of features is proposed in this dissertation. The attention shift decoding improves the automatic speech recognition performance.
机译:尽管有大量刺激会影响神经系统的感觉和有限的资源,但人类仍可以实时精确地处理和解释复杂的场景。此功能的关键推动因素之一是称为“注意力”的神经机制。本论文的重点是开发可模拟人类听觉注意力并证明其在口语和音频处理应用中的有效性的计算算法。注意力可以使灵长类动物有效地将其神经资源分配到感兴趣的位置,以精确地解释场景或寻找目标。在一个场景中,某些刺激在上下文中固有地很明显,并且它们以自下而上的方式引起了人们的注意。显着性驱动的注意力是一个快速,自下而上,与任务无关的过程,它可以检测到感知上突然弹出的物体与相邻物体明显不同的物体。第二种注意力形式是自上而下的,与任务相关的过程,该过程利用先验知识和以往的经验将注意力集中在场景中的目标位置上,以增强处理能力。自下而上的听觉注意模型的概念。提出了一个听觉显着性图,以对这种显着性驱动的自下而上的听觉注意力进行建模。注意模型的特征提取结构受到人类听觉系统中处理阶段的启发。实验表明,自下而上的听觉注意模型可以成功地检测出语音中的突出音节和单词。此外,自下而上的听觉注意模型用于检测复杂声学场景中的显着声学事件。结果表明,仅使用选定的显着事件进行声学场景分类的效果要优于传统的音频内容处理算法,后者可以完全处理整个信号并将所有内容都视为同等重要。本论文的下一个贡献是对任务相关影响对听觉注意的影响。为此,本文提出了一种生物学上可行的自上而下的模型。自上而下的注意力模型与自下而上的听觉注意力模型具有相同的前端,并偏向于模仿神经元任务的功能。除了声音提示之外,模型中还包含了更高级别的与任务相关的提示,例如词汇和句法信息。该组合模型在突出的音节/单词检测任务上表现出最高的性能,说明先验任务信息的重要性。最后,提出了一种基于人类语音识别的注意力转移解码方法。与传统的自动语音识别系统相反,传统的自动语音识别系统从左到右完全连续地解码语音,而注意力转移解码方法则使用可靠性标准连续地解码语音。为了检测语音的可靠区域,本文提出了一组新的特征。注意移位解码提高了自动语音识别性能。

著录项

  • 作者

    Kalinli, Ozlem.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号