Much recent research activity has focused toward automatically extracting linguistic information from on-line corpora. There is no question that great progress has been made applying machine learning to computational linguistics. We believe now that the field has matured, it is time to look inwards and carefully examine the basic tenets of the corpus-based learning paradigm. The goal of this paper is to raise a number of issues that challenge the paradigm in hopes of stimulating introspection and discussion that will make the field even stronger.
展开▼