Natural language processing is all about ambiguities. In machine translation, tokenization and parsing mistakes due to segmentation and structural ambiguities potentially introduce translation errors. A well-known solution is to provide more alternatives by using compact representations such as lattice and forest. In this talk, I will introduce a technique that goes beyond using lattices and forests, which integrates tokenization, parsing, and translation in one system. Therefore, tokenization, parsing, and translation can interact with and benefit each other in a discriminative framework. Experimental results show that such integration significantly improves tokenization and translation performance.
展开▼