Compounds differ in the degree to which they are semantically compositional (compare, e.g., "carwash", "handbag", "beefcake" and "humbug"). Since even relatively transparent compounds such as "carwash" may leave the uninitiated reader with uncertainty about the intended meaning (soap for washing cars? a place where you can get your car washed?), an efficient way of retrieving the meaning of a compound is to use the compound's form as an access key for its meaning. However, in psychology, the view has become popular that at the earliest stage of lexical processing in reading, a morpho-orthographic decomposition into morphemes would necessarily take place. Theorists ascribing to obligatory decomposition appear to have some hash coding scheme in mind, with the constituents providing entry points to a form of table look-up (e.g., Taft & Forster, 1976). Leaving aside the question of whether such a hash coding scheme would be computationally efficient as well as the question how the putative morpho-orthographic representations would be learned, my presentation focuses on the details of lexical processing as revealed by an eye-tracking study of the reading of English compounds in sentences. A careful examination of the eye-tracking record with generalized additive modeling (Wood, 2006), combined with computational modeling using naive discrimination learning (Baayen, Milin, Filipovic, Hendrix, & Marelli, 2011) revealed that how far the eye moved into the compound is co-determined by the compound's lexical distributional properties, including the cosine similarity of the compound and its head in document vector space (as measured with latent semantic analysis, Landauer & Dumais, 1997). This indicates that compound processing is initiated already while the eye is fixating on the preceding word, and that even before the eye has landed on the compound, processes discriminating the meaning of the compound from the meaning of its head have already come into play. Once the eye lands on the compound, two very different reading signatures emerge, which critically depend on the letter trigrams spanning the morpheme boundary (e.g., "ndb" and "dba" in "handbag"). From a discrimination learning perspective, these boundary trigrams provide the crucial (and only) orthographic cues for the compound's (idiosyncratic) meaning. If the boundary trigrams are sufficiently strongly associated with the compound's meaning, and if the eye lands early enough in the word, a single fixation suffices. Within 240 ms (of which 80 ms involve planning the next saccade) the compound's meaning is discriminated well enough to proceed to the next word. However, when the boundary trigrams are only weakly associated with the compound's meaning, multiple fixations become necessary. In this case, without the availability of the critical orthographic cues, the eye-tracking record bears witness to the cognitive system engaging not only bottom-up processes from form to meaning, but also top-down guessing processes that are informed by the a-priori probability of the head and the cosine similarities of the compound and its constituents in semantic vector space. These results challenge theories positing obligatory decomposition with hash coding, as hash coding predicts insensitivity to semantic transparency, contrary to fact. Our results also challenge theories positing blind look-up based on compounds' orthographic forms. Although this might be computationally efficient, the eye can't help seeing parts of the whole. In summary, reality is much more complex, with deep pre-arrival parafoveal processing followed by either efficient discrimination driven by the boundary trigrams (within 140 ms), or by an inefficient decompositional process (requiring an additional 200 ms) that seeks to make sense of the conjunction of head and modifier.
展开▼