首页> 外文会议>Annual meeting of the Association for Computational Linguistics >When parsing makes things worse: An eye-tracking study of English compounds
【24h】

When parsing makes things worse: An eye-tracking study of English compounds

机译:当解析使事情变得更糟时:英语化合物的眼睛跟踪研究

获取原文

摘要

Compounds differ in the degree to which they are semantically compositional (compare, e.g., "carwash", "handbag", "beefcake" and "humbug"). Since even relatively transparent compounds such as "carwash" may leave the uninitiated reader with uncertainty about the intended meaning (soap for washing cars? a place where you can get your car washed?), an efficient way of retrieving the meaning of a compound is to use the compound's form as an access key for its meaning. However, in psychology, the view has become popular that at the earliest stage of lexical processing in reading, a morpho-orthographic decomposition into morphemes would necessarily take place. Theorists ascribing to obligatory decomposition appear to have some hash coding scheme in mind, with the constituents providing entry points to a form of table look-up (e.g., Taft & Forster, 1976). Leaving aside the question of whether such a hash coding scheme would be computationally efficient as well as the question how the putative morpho-orthographic representations would be learned, my presentation focuses on the details of lexical processing as revealed by an eye-tracking study of the reading of English compounds in sentences. A careful examination of the eye-tracking record with generalized additive modeling (Wood, 2006), combined with computational modeling using naive discrimination learning (Baayen, Milin, Filipovic, Hendrix, & Marelli, 2011) revealed that how far the eye moved into the compound is co-determined by the compound's lexical distributional properties, including the cosine similarity of the compound and its head in document vector space (as measured with latent semantic analysis, Landauer & Dumais, 1997). This indicates that compound processing is initiated already while the eye is fixating on the preceding word, and that even before the eye has landed on the compound, processes discriminating the meaning of the compound from the meaning of its head have already come into play. Once the eye lands on the compound, two very different reading signatures emerge, which critically depend on the letter trigrams spanning the morpheme boundary (e.g., "ndb" and "dba" in "handbag"). From a discrimination learning perspective, these boundary trigrams provide the crucial (and only) orthographic cues for the compound's (idiosyncratic) meaning. If the boundary trigrams are sufficiently strongly associated with the compound's meaning, and if the eye lands early enough in the word, a single fixation suffices. Within 240 ms (of which 80 ms involve planning the next saccade) the compound's meaning is discriminated well enough to proceed to the next word. However, when the boundary trigrams are only weakly associated with the compound's meaning, multiple fixations become necessary. In this case, without the availability of the critical orthographic cues, the eye-tracking record bears witness to the cognitive system engaging not only bottom-up processes from form to meaning, but also top-down guessing processes that are informed by the a-priori probability of the head and the cosine similarities of the compound and its constituents in semantic vector space. These results challenge theories positing obligatory decomposition with hash coding, as hash coding predicts insensitivity to semantic transparency, contrary to fact. Our results also challenge theories positing blind look-up based on compounds' orthographic forms. Although this might be computationally efficient, the eye can't help seeing parts of the whole. In summary, reality is much more complex, with deep pre-arrival parafoveal processing followed by either efficient discrimination driven by the boundary trigrams (within 140 ms), or by an inefficient decompositional process (requiring an additional 200 ms) that seeks to make sense of the conjunction of head and modifier.
机译:化合物在它们在语义上是组成程度不同(比较,例如,“洗车”,“手提包”,“美型男”和“骗子”)。由于如“洗车”即使是相对透明的化合物可能会留下外行读者的不确定性预期的意义(肥皂清洗汽车?一个地方,你可以得到你的车洗?),检索化合物的含义是一种有效的方法所使用的化合物的形式作为其意义的访问密钥。然而,在心理学上,有观点已成为流行,在词汇加工在阅读中的最早阶段,一个形态 - 正交分解成词素必然会发生。理论家归咎于有关强制性分解似乎心中有一个散列编码方案,与三方提供切入点,以查表(例如,塔夫脱和福斯特,1976年)的形式。撇开是否有这样的哈希值,以及问题的假定形态 - 正交表示将如何得知,我的演讲将重点讨论词法处理的细节通过的眼球追踪研究中所揭示的编码方案将计算效率的问题英语阅读中的句子化合物。与广义相加建模(木,2006年)的眼球追踪记录的仔细检查,用天真的辨别学习(Baayen,米林,菲利波维奇,亨德里克斯,与马瑞利,2011)与计算模型相结合透露多远眼睛移到化合物由化合物的词法分布性质共同决定,包括该化合物的余弦相似性和其在文档的向量空间头(与潜在语义分析,兰道尔&杜迈斯,1997测量)。这表明,复合加工开始已经当眼睛前面的字就行了吧,而且眼睛已经降落在复合前,甚至从它的头的意思区分化合物的含义进程已经开始发挥作用。一旦在化合物的眼的土地,两个非常不同的阅读签名出现,这主要取决于信卦跨越词素边界(例如,“NDB”,并在“手提包”“DBA”)。从辨别学习的角度来看,这些边界卦提供了该化合物的(特质)意义的关键(只)正交线索。如果边界卦充分强烈复合的意思相关联,并且如果眼睛土地及早的一句话,一个固定就足够了。在240毫秒(其中80毫秒包括计划在未来扫视)化合物的含义辨别不够好,继续到下一个单词。然而,当边界卦只微弱地与该化合物的含义有关,多注视成为必要。在这种情况下,如果没有关键的正投影线索的可用性,眼动跟踪记录见证了认知系统参与不仅从形式自下而上的流程来的意思,而且自上而下的猜测,由A-通知过程头部的先验概率和该化合物的余弦相似性和其成分在语义向量空间。这些结果挑战的理论与并主张散列编码强制性分解,作为散列编码预测不敏感语义透明度,与事实相反。我们的研究结果也质疑理论并主张盲目查找基于化合物字形形式。虽然这可能是计算效率高,眼睛不禁看到整体的组成部分。总之,实际情况是复杂得多,深预到达旁中心凹处理,接着为无论有效判别由边界卦驱动(内140毫秒),或通过一个低效decompositional处理(需要另外的200毫秒),其旨在使感的头部和改性剂的结合使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号