首页> 外文会议>IEEE International Solid- State Circuits Conference >9.9 A Background-Noise and Process-Variation-Tolerant 109nW Acoustic Feature Extractor Based on Spike-Domain Divisive-Energy Normalization for an Always-On Keyword Spotting Device
【24h】

9.9 A Background-Noise and Process-Variation-Tolerant 109nW Acoustic Feature Extractor Based on Spike-Domain Divisive-Energy Normalization for an Always-On Keyword Spotting Device

机译:9.9基于Spike-Diable Diveive-Leatton-Energy化的背景噪声和过程变化容差109NW声学特征提取器,用于始终开启关键字点发现设备

获取原文

摘要

In mobile and edge devices, always-on keyword spotting (KWS) is an essential function to detect wake-up words. Recent works achieved extremely low power dissipation down to $sim500$ nW [1]. However, most of them adopt noise-dependent training, i.e. training for a specific signal-to-noise ratio (SNR) and noise type [1], and therefore their accuracies degrade for different SNR levels and noise types that are not targeted in the training (Fig. 9.9.1, top left). To improve robustness, so-called noise-independent training can be considered, which is to use the training data that includes all the possible SNR levels and noise types [2]. But, this approach is challenging for an ultra-low-power device since it demands a large neural network to learn all the possible features. A neural network of a fixed size has its own memory capacity limit and reaches a plateau in accuracy if it has to learn more than its limit (Fig. 9.9.1, top right). On the other hand, it is known that biological acoustic systems employ a simpler process, called divisive energy normalization (DN), to maintain accuracy even in varying noise conditions [3]. In this work, therefore, by adopting such a DN, we prototype a normalized acoustic feature extractor chip (NAFE) in 65nm. The NAFE can take an acoustic signal from a microphone and produce spike-rate coded features. We pair NAFE with a spiking neural network (SNN) classifier chip [4], creating the end-to-end KWS system. The proposed system achieves 89-to-94% accuracy across -5 to 20dB SNRs and four different noise types on HeySnips [5], while the baseline without DN achieves a much lower accuracy of 71-87%. NAFE consumes up to 109nW and the KWS system 570nW.
机译:在移动和边缘设备中,始终开启关键字拍摄(KWS)是检测唤醒单词的基本函数。最近的作品实现了极低的功耗耗散到$ SIM500 $ NW [1]。然而,大多数人采用噪声依赖性训练,即针对特定信噪比(SNR)和噪声类型[1]的训练,因此它们的精度降低了不同的SNR水平和未被定位的噪声类型培训(图9.9.1,左上角)。为了提高稳健性,可以考虑所谓的无关培训,这是使用包括所有可能的SNR水平和噪声类型的培训数据[2]。但是,这种方法对超低功耗设备具有挑战,因为它需要大型神经网络来学习所有可能的特征。固定大小的神经网络具有自己的存储容量限制,并且如果必须学习超过其限制(图9.9.1,右上右),则可以准确地到达高原。另一方面,已知生物声学系统采用更简单的过程,称为分隔能量归一化(DN),即使在不同的噪声条件下也可以保持精度[3]。因此,在这项工作中,通过采用这种DN,我们在65nm中原型归一化声学特征提取器芯片(Nafe)。 Nafe可以从麦克风采取声学信号并产生尖峰速率编码特征。我们将Nafe与尖刺神经网络(SNN)分类器芯片[4]配对,创建端到端KWS系统。所提出的系统在Heysnips [5]上跨越-5至20dB的SNR和四种不同噪声类型的精度实现了89至94%,而没有DN的基线可以实现71-87%的较低精度。 Nafe消耗最多109NW和KWS系统570nw。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号