Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining neural network architectures. One of the methods includes receiving training data; receiving architecture data; assigning, to each of a plurality of network operators, a utilization variable indicating a likelihood of the network operator being utilized in a neural network; generating an optimized neural network for performing the neural network task, comprising, repeatedly performing the following: sampling a selected set of network operators; and training the neural network having an architecture defined by the selected set of network operators, wherein the training comprises: computing an objective function evaluating (i) a measure of computational cost of the neural network and (ii) a measure of performance of the neural network on the neural network task associated with the training data; and adjusting the respective current values of the utilization variables and respective current values of the neural network parameters.
展开▼