We introduce a novel mixed chaiacter-word architecture to improve Chinese sentence representations, by utilizing rich semantic information of word internal structures. Our architecture uses two key strategies. The first is a mask gate on characters, learning the relation among characters in a word. The second is a maxpooling operation on words, adaptively finding the optimal mixture of the atomic and compositional word representations Finally, the proposed architecture is applied to various sentence composition models, which achieves substantial performance gains over baseline models on sentence similarity task.
展开▼