Much effort has transpired over the past three decades in the formulation of "ideal" acoustic features which represent the speech signal in a discriminative and compact manner while being robust to adverse conditions and invariant to speaker differences. A good way of making ASR systems invariant to speaker differences is to perform speaker normalization on the input features. The most popular speaker normalization technique is the vocal tract length normalization (VTLN). However, its implementation requires immense computational resources and not practically applicable in real-time/embedded ASR systems. In this paper, we propose a new speaker normalization algorithm entitled Built-in Speaker Normalization (BISN) which is performed on-the-fly within the newly proposed PMVDR acoustic front-end and reduces computational resources significantly enabling its use within contemporary ASR systems. Evaluations using an in-car extended digit recognition task showed that on-the-fly implementation of the BISN algorithm produced a relative word error rate (WER) reduction of 24% compared to a no speaker normalization baseline.
展开▼