We propose a novel deep learning approach to solveudsimultaneous alignment and recognition problems (referredudto as “Sequence-to-sequence” learning). We decompose theudproblem into a series of specialised expert systems referredudto as SubUNets. The spatio-temporal relationships betweenudthese SubUNets are then modelled to solve the task, whileudremaining trainable end-to-end.udThe approach mimics human learning and educationaludtechniques, and has a number of significant advantages. SubUNetsudallow us to inject domain-specific expert knowledgeudinto the system regarding suitable intermediate representations.udThey also allow us to implicitly perform transferudlearning between different interrelated tasks, which also allowsudus to exploit a wider range of more varied data sources.udIn our experiments we demonstrate that each of these propertiesudserves to significantly improve the performance of theudoverarching recognition system, by better constraining theudlearning problem.udThe proposed techniques are demonstrated in the challenginguddomain of sign language recognition. We demonstrateudstate-of-the-art performance on hand-shape recognition (outperformingudprevious techniques by more than 30%). Furthermore,udwe are able to obtain comparable sign recognitionudrates to previous research, without the need for an alignmentudstep to segment out the signs for recognition.
展开▼