In this paper, we propose a method to detect the wake-up-word (WUW) using microphone array for human-robot interaction. The consistency of the spatial eigenspaces formed by the speech source at different frequencies and the resonant curve similarity of the WUW are used as the features for detection. These features are processed and detected separately and the result is determined by cascading individual outcome using Bayes risk detector. This proposed method can keep a high recognition rate under very low signal-to-noise ratio (SNR) conditions. In addition, this method can estimate the direction of arrivals of the sound source, and the proposed architecture is easy to expand by adding detectors with other features in the cascaded manner to further improve the recognition rate.
展开▼