Convolutional Neural Networks (CNN) are successfully used for various visual perception tasks including bounding box object detection, semantic segmentation, optical flow, depth estimation, visual SLAM, etc. Generally these tasks are independently explored and modeled. In this paper, we present a joint multi-task network design for learning various such tasks simultaneously. The main advantages are increased run time efficiency through shared network parameters across tasks, scalability to add more tasks lever-aging previous features and better generalization through inductive transfer. We provide a systematic taxonomy of multi-task learning CNN topologies based on an extensive survey of various architectures, loss functions and training strategies. We classified Deep Multi Task Learning (DMTL) topologies into 5 categories namely Parallel & Sequential task branch, Soft parameter sharing, Hierarchical representation and Recurrent topologies. The proposed network jointly learns object detection and semantic segmentation and is implemented in Keras & Tensorflow Frameworks. The network architecture consists of ResNet-10 as a common trunk and two task dependent decoders-YOLO like decoder for object detection and FCN8 like decoder for semantic segmentation. We demonstrate the prototype on wide-angle fisheye lens cameras which are becoming popular for automated driving because of their large FOV. We believe that this is the first work to demonstrate the DMTL on surround view fisheye cameras.
展开▼