In this paper, we report our work on extracting lexicalized tree adjoining grammars (LTAGs) from partially bracketed corpora. The algorithm first fully brackets the corpora, then extracts elementary trees (etrees), and finally filters out invalid etrees using linguistic knowledge. We show that the set of extracted etrees may not be complete enough to cover the whole language, but this will not have a big impact on parsing.
展开▼