A single-sound channel robustness speech keyword real-time detection method, comprising the following steps: receiving noisy speech of an electronic format; converting a time domain speech signal into a frequency domain signal by means of short-time Fourier transform in a frame-by-frame mode; using a Mel filter to process the frequency domain signal so as to obtain a Mel feature as an acoustic feature; making the Mel feature pass a neural network in a frame-by-frame mode, and then using a normalized exponential function to process the Mel feature to obtain the confidence degree information of each keyword; when the confidence degree information of a certain keyword is greater than a predefined threshold, splicing the current frame and previous several frames so as to be used as an output of the neural network; and sequentially passing through an attention mechanism and a feed-forward type deep neural network, and performing processing by means of the normalized exponential function so as to obtain the confidence degree information of each sentence-level keyword, when a confidence degree value is greater than the predefined threshold, considering that the keyword is detected, and otherwise, considering the keyword is not detected. The method still can keep a high wakeup rate in a noisy environment, has wide applicability, and can greatly reduce the false alarm rate of the neural network and improve the detection performance of the keyword.
展开▼