Cascading variational autoencoder ("VAE") models can be used to detect robot collisions while a robot is performing a task. For a current state of the robot, various implementations include a VAE used to generate a latent space of the current state, and a predictor network used to generate a predicted latent space for the current state. A collision can be determined based on a difference between the latent space for the current state and the predicted latent space for the current state.
展开▼