A system trains a machine learning model to generate a high-resolution depth image. During a training phase, the system generates an accurate three dimensional reconstruction of a training scene such that the machine learning model is iteratively trained to minimize an error between the higher resolution depth image and the depth information in the accurate three dimensional reconstruction. During a real-time phase, the system applies the trained machine learning model to images captured from a scene of interest and generates a higher resolution depth image with higher accuracy. Thus, the higher resolution depth image can be subsequently used to solve computer vision problems.
展开▼