A method, systems, and apparatus containing computer programs encoded on computer storage media for implementing long-term memory layers having compressed gating functions. One of the systems includes a first LSTM layer having gates configured to generate, for each of a plurality of time steps, a respective inter-gate output vector by multiplying a gate input vector and a gate parameter matrix. The gate parameter matrix for at least one of the gates is a structured matrix or is defined by a compressed parameter matrix and a projection matrix. By including the compressed LSTM layer in the recurrent neural network, the recurrent neural network is configured to be able to process data more efficiently and use less data memory. A recurrent neural network with a compressed LSTM layer can be effectively trained to achieve word error rates that can be achieved with non-subsized, e.g. B. uncompressed, recurrent neural networks are comparable.
展开▼