首页> 外文学位 >The construction of meaning: The role of context in corpus-based approaches to language modeling.
【24h】

The construction of meaning: The role of context in corpus-based approaches to language modeling.

机译:含义的构建:上下文在基于语料库的语言建模方法中的作用。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents a framework for statistically modeling words and sentences. It focuses on the role of context in learning semantic representations from a corpus. In recent years, approaches like Latent Semantic Analysis (LSA) [15, 49] by Landauer and Probabilistic Topic Models (LDA) [34, 8, 28, 68] have both enjoyed success with the psycholinguistics community as being theories of meaning and models of language understanding. They serve as important components of information retrieval, machine translation, and document summarization systems, as well as in several other applications. However, sentences have a rich set of semantic and syntactic features which cannot be accurately represented by these models as they are based on an order-independent bag-of-words assumption. This dissertation develops a model which takes these syntagmatic and paradigmatic constraints into account and provides a better model for sentence processing.;The Construction Integration II (CI-II) model of Kintsch and Mangalath [46] is a cognitively plausible computational account of how language is acquired and stored as representations in long term memory, which are then retrieved contextually to generate meaning in working memory. Semantic constraints are modeled using LSA, the Topics Model and context cooccurrence probabilities. Syntactic constraints are modeled using Ngrams and Dependency Grammars [9, 11, 12, 19, 37, 36]. In short, I show how text is structurally decomposed and combined with the comprehenders' prior knowledge in order to understand the text. It demonstrates how the expressiveness from explicitly modeling context leads to a better word sense disambiguation process.;This dissertation develops a tree edit distance [6, 48] based metric---Dependency Edit Distance--- that structurally decomposes sentences into dependency relations and measures similarity in terms of the semantic and syntactic cost associated in transforming one to the other. It further applies supervised machine learning techniques to use these measures between labelled pairs of sentences and build models with predictive accuracies that match human raters. The long term goal of this research is to map this model into software that helps students learn in an instructional environment capable of assessing their comprehension. I show data from two experiments in which student responses were automatically graded; the results show great potential towards such a practical realization.
机译:本文提出了一个统计建模单词和句子的框架。它着重于上下文在从语料库学习语义表示中的作用。近年来,Landauer的潜在语义分析(LSA)[15、49]和概率主题模型(LDA)[34、8、28、68]等方法都在心理语言学界获得了成功,成为意义和模型的理论语言理解能力。它们是信息检索,机器翻译和文档摘要系统以及其他几个应用程序的重要组成部分。但是,句子具有丰富的语义和句法特征集,因为它们基于与顺序无关的词袋假设,因此无法用这些模型准确表示。本文开发了一个考虑到这些句法和范式约束的模型,并为句子处理提供了一个更好的模型。; Kintsch and Mangalath [46]的构建整合II(CI-II)模型是语言如何在认知上合理的计算方式习题被获取并作为表示存储在长期存储器中,然后根据上下文进行检索以在工作存储器中产生含义。使用LSA,主题模型和上下文共现概率对语义约束进行建模。语法约束是使用Ngrams和Dependency Grammrs [9,11,12,19,37,36]建模的。简而言之,我展示了文本如何在结构上分解并与理解者的先验知识相结合以理解文本。它演示了显式建模上下文的表达方式如何导致更好的词义歧义消除过程。本论文开发了一种基于树编辑距离[6,48]的度量标准-依赖编辑距离-将句子从结构上分解为依赖关系,并且根据在相互转换中相关的语义和句法成本来衡量相似性。它还应用了监督机器学习技术,以在标记的句子对之间使用这些度量,并建立具有与人类评分者相匹配的预测准确性的模型。这项研究的长期目标是将该模型映射到可以帮助学生在能够评估其理解力的教学环境中学习的软件中。我展示了来自两个实验的数据,在这些实验中,学生的回答被自动评分;结果显示出实现这种实际实现的巨大潜力。

著录项

  • 作者

    Mangalath, Praful.;

  • 作者单位

    University of Colorado at Boulder.;

  • 授予单位 University of Colorado at Boulder.;
  • 学科 Psychology Cognitive.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 119 p.
  • 总页数 119
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:36:41

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号