Recently, sensing technology has been dramatically developed. Along with this, a wide variety of sensors have been used in a single system such as automated driving technology and the robot industry. However, as the number of sensors in a system increases, a fusion method for information obtained from the sensors becomes a problem. When humans recognize information from environment, the information obtained from the five senses is once transmitted and processed in sensory areas such as the visual and auditory areas of the brain. After that, the information processed in the sensory area is transmitted to the association area, and information fusion is performed. Also in robot's sensor fusion system, development of such human sensor fusion system is expected. In this paper, we propose a method to extract feature value using deep learning for each sensor and fusion the feature value. In this system, a system constructed by combining lipreading and speech recognition using visual and auditory information. We aim to realize sensor fusion by extracting feature value and recognizing words using Convolutional Neural Network (CNN) respectively for visual and auditory information and inputting the recognition results to the Neural Network that fusion the recognition results. Recently, sensing technology has been dramatically developed. Along with this, a wide variety of sensors have been used in a single system such as automated driving technology and the robot industry. However, as the number of sensors in a system increases, a fusion method for information obtained from the sensors becomes a problem. When humans recognize information from environment, the information obtained from the five senses is once transmitted and processed in sensory areas such as the visual and auditory areas of the brain. After that, the information processed in the sensory area is transmitted to the association area, and information fusion is performed. Also in robot's sensor fusion system, development of such human sensor fusion system is expected. In this paper, we propose a method to extract feature value using deep learning for each sensor and fusion the feature value. In this system, a system constructed by combining lipreading and speech recognition using visual and auditory information. We aim to realize sensor fusion by extracting feature value and recognizing words using Convolutional Neural Network (CNN) respectively for visual and auditory information and inputting the recognition results to the Neural Network that fusion the recognition results.
展开▼