We introduce a statistical model for the morphology of natural languages. As words contain a root and potentially a prefix and a suffix, we associate three vector components, one for the root, one for the prefix, and one for the suffix. As the morphology captures important semantic notions and syntactic instructions, a new Content vector c can be associated with the sentences. It can be computed online and used to find the most likely derivation tree in a grammar. The model was inspired by the analysis of Amis, an Austronesian language with a rich morphology.
展开▼