In this computational study for acquiring syntactic containment constructions, we suggest that emergent structures in the sensorimotor space may help limit the search for linguistic patterns to a much smaller corpus, in which the earliest glimmerings of syntax may originate. We consider an early learner exposed to perceptual data (video) with co-occurring word-segmented commentaries. In the initial stages of an incremental statistical learning process, content words (e.g. object names) are acquired. Then, some prototypical containment categories emerge by clustering spatial relations in the perceptual input. These situational regularities are used to focus only on sentences uttered during containment situations. By using k-gram and path alignment models on this limited corpus, we show that such a system may begin to learn word categories as well as syntacto-semantic constructions such as “into the X”/[IN(tr,X)]. A key step in the process is the structure in which this perceptual knowledge is organized. The term “image schema” has often been used to refer to such structures, but has widely divergent interpretations. We view the image schema as a category oracle; initially limited to spatio-temporal situation, it may extend, with increasing exposure to language, to new situations.
展开▼