Multiword Expressions (MWEs) display some kind of linguistic and statistical markedness that may influence the effectiveness of techniques that automatically identify them in texts. While parsingbased techniques for MWE identification are considered to be better at handling long-distance dependencies, passivization and internal modification, statistics-based techniques use association measures to detect statistical markedness regardless of syntactic form. In this paper we compare these two approaches focusing on nominal compounds in Portuguese. We compare the accuracy of each method and propose that combining the strengths of both for increased accuracy.
展开▼