This dissertation has two objectives. The first is to present the formal foundations of a cue-based model of learning and show how it can be used in learning subcategorization frames. The other objective is to give further evidence of the role of the input in both automatic and human language acquisition. Two implementations of this model are presented. The first is a set of algorithms that can identify arguments, predicates, and subcategorization frames. It bootstraps from proper names and a small subset of pronouns. The other implementation does not assume any initial cues, and learning is based only on distributional regularities in the input. It presents a procedure for cue extraction and then demonstrates how these cues can be used in categorization and subcategorization. The two implementations achieved an overall accuracy of 98% and 97%, respectively. This performance level shows that the cue-based learning model proposed in this dissertation is able to capture language-specific properties given minimum or zero initial knowledge. The importance of this cue-based model stems from three main reasons. The first is that it presents a Natural Language Processing tool to acquire linguistic knowledge from minimum or zero initial knowledge with a level of accuracy that is significantly higher than that achieved by previous methods that assume much more initial knowledge. The second is the evidence it provides for the possibility of language acquisition using a small set of cues in the input by means of distributional analysis. Finally, this model is language-independent, which makes it extensible to other linguistic tasks and other languages with little parameterization.
展开▼