This paper describes the application of a new model to learn probabilistic context-free grammars (PCFGs) from a tree bank corpus. The model estimates the probabilities according to a generalized k-gram scheme for trees. It allows for faster parsing, decreases considerably the perplexity of the test samples and tends to give more structured and refined parses. In addition, it also allows several smoothing techniques such as backing-off or interpolation that are used to avoid assigning zero probability to any sentence.
展开▼