We present a new method to estimate the 6DOF pose of the event camera solelybased on the event stream. Our method first creates the event image from a listof events that occurs in a very short time interval, then a Stacked SpatialLSTM Network (SP-LSTM) is used to learn and estimate the camera pose. OurSP-LSTM comprises a CNN to learn deep features from the event images and astack of LSTM to learn spatial dependencies in the image features space. Weshow that the spatial dependency plays an important role in the pose estimationtask and the SP-LSTM can effectively learn that information. The experimentalresults on the public dataset show that our approach outperforms recent methodsby a substantial margin. Overall, our proposed method reduces about 6 times theposition error and 3 times the orientation error over the state of the art. Thesource code and trained models will be released.
展开▼