首页> 外文OA文献 >The Construction of Meaning: The role of context in corpus-based approaches to language modeling
【2h】

The Construction of Meaning: The role of context in corpus-based approaches to language modeling

机译:意义的建构:语境在基于语料库的语言建模方法中的作用

摘要

This dissertation presents a framework for statistically modeling words and sentences. It focuses on the role of context in learning semantic representations from a corpus. In recent years, approaches like Latent Semantic Analysis (LSA) (Deerwester et al., 1990; Landauer et al., 1997) and Probabilistic Topic Models (LDA) (Blei et al., 2003; Griffiths and Steyvers, 2002,2007; Hofmann, 1999) have both enjoyed success with the psycholinguistics community as being theories of meaning and models of language understanding. They serve as important components of information retrieval, machine translation, and document summarization systems, as well as in several other applications. However, sentences have a rich set of semantic and syntactic features which cannot be accurately represented by these models as they are based on an order-independent bag-of-words assumption. This dissertation develops a model which takes these syntagmatic and paradigmatic constraints into account and provides a better model for sentence processing.The Construction Integration II (CI-II) model of Kintsch and Mangalath (Kintsch and Mangalath, 2010) is a cognitively plausible computational account of how language is acquired and stored as representations in long term memory, which are then retrieved contextually to generate meaning in working memory. Semantic constraints are modeled using LSA, the Topics Model and context co-occurrence probabilities. Syntactic constraints are modeled using Ngrams and Dependency Grammars (De Marneffe et al., 2006; Collins, 1999; Covington, 2001; Eisner, 1996; Hall et al., 2004). In short, I show how text is structurally decomposed and combined with the comprehendersu27 prior knowledge in order to understand the text. It demonstrates how the expressiveness from explicitly modeling context leads to a better word sense disambiguation process.This dissertation develops a tree edit distance (Bille, 2005; Kouylekov and Magnini, 2005) based metric---Dependency Edit Distance---that structurally decomposes sentences into dependency relations and measures similarity in terms of the semantic and syntactic cost associated in transforming one to the other. It further applies supervised machine learning techniques to use these measures between labelled pairs of sentences and build models with predictive accuracies that match human raters. The long term goal of this research is to map this model into software that helps students learn in an instructional environment capable of assessing their comprehension. I show data from two experiments in which student responses were automatically graded; the results show great potential towards such a practical realization.
机译:本文提出了一个统计建模单词和句子的框架。它着重于上下文在从语料库学习语义表示中的作用。近年来,诸如潜在语义分析(LSA)(Deerwester等人,1990; Landauer等人,1997)和概率主题模型(LDA)(Blei等人,2003; Griffiths和Steyvers,2002,2007; Probilistic Topic Models(LDA))等方法。霍夫曼(Hofmann,1999)作为意义理论和语言理解模型在心理语言学界都获得了成功。它们是信息检索,机器翻译和文档摘要系统以及其他几个应用程序的重要组成部分。但是,句子具有丰富的语义和句法特征集,因为它们基于与顺序无关的词袋假设,因此无法用这些模型准确表示。本文建立了一个考虑到这些句法和范式约束的模型,为句子的处理提供了一个更好的模型。Kintsch和Mangalath的建构整合II(CI-II)模型(Kintsch和Mangalath,2010年)是一个认知上合理的计算账户。语言如何获取以及如何作为表示存储在长期记忆中,然后根据上下文进行检索以在工作记忆中产生含义。使用LSA,主题模型和上下文共现概率对语义约束进行建模。句法约束使用Ngrams和Dependency Grammars进行建模(De Marneffe等,2006; Collins,1999; Covington,2001; Eisner,1996; Hall等,2004)。简而言之,我展示了文本如何在结构上分解并与理解者的先验知识相结合以理解文本。它证明了显式建模上下文的表达方式如何导致更好的词义消歧过程。本论文开发了一种基于树的编辑距离(Bille,2005; Kouylekov and Magnini,2005),该度量基于结构分解的度量-依赖编辑距离-句子之间的依存关系,并根据相互转化的语义和句法成本来衡量相似性。它还应用了监督机器学习技术,以在标记的句子对之间使用这些度量,并建立具有与人类评分者相匹配的预测准确性的模型。这项研究的长期目标是将该模型映射到可以帮助学生在能够评估其理解力的教学环境中学习的软件中。我展示了来自两个实验的数据,在这些实验中,学生的回答是自动评分的;结果显示出实现这种实际实现的巨大潜力。

著录项

  • 作者

    Mangalath Praful;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号