Because PCFGs are, as their name suggests, context-free, they cannot encode many dependencies that occur in natural language, such as the dependencies between determiners and nouns, allowing them to overgenerate phrases like those cat. One formalism that is able to capture many dependencies that PCFGs cannot is that of probabilistic tree-substitution grammars (PTSGs). Because PTSGs allow larger subtrees to be used as grammar rules, they can better model natural language but are also more difficult to induce from a corpus. In this paper, I will show how PTSGs can be used to represent dependencies between determiners and nouns and present a novel method for inducing a PTSG from a parsed corpus.
展开▼