Deep learning methods achieve excellent performance in the fields of computer vision and natural language processing through end-to-end supervised training dependent on large scale labeled datasets. However, the existing methods are often targeted for single modal data, ignoring the inherent structure of the data with the lack of theoretical support. Therefore, the wavelet theory based deep convolution networks, the structured deep learning and the multi-modal deep learning are discussed in this paper to demonstrate the potential methods of the combination of deep learning techniques, wavelet theory and structure prediction, and the viable mechanism for extending to multi-modal data is explored as well.%深度学习方法依赖于大规模的标签数据,通过端到端的监督训练,在计算机视觉、自然语言处理领域都取得优异性能.但是,现有方法通常针对单一模态数据,忽视数据的内在结构,缺乏理论支撑.针对上述问题,文中从基于小波核学习的深度滤波器组网络设计、基于结构化学习的深度学习、基于多模态学习的深度学习3个角度阐述结合深度学习方法与小波理论、结构化预测的潜在方法,以及其拓展到多模态数据的可行机制.
展开▼