...
首页> 外文期刊>Behavior Research Methods >More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis
【24h】

More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis

机译:更多数据胜过更智能的算法:将点向互信息与潜在语义分析进行比较

获取原文
获取原文并翻译 | 示例
           

摘要

Computational models of lexical semantics, such as latent semantic analysis, can automatically generate semantic similarity measures between words from statistical redundancies in text. These measures are useful for experimental stimulus selection and for evaluating a model’s cognitive plausibility as a mechanism that people might use to organize meaning in memory. Although humans are exposed to enormous quantities of speech, practical constraints limit the amount of data that many current computational models can learn from. We follow up on previous work evaluating a simple metric of pointwise mutual information. Controlling for confounds in previous work, we demonstrate that this metric benefits from training on extremely large amounts of data and correlates more closely with human semantic similarity ratings than do publicly available implementations of several more complex models. We also present a simple tool for building simple and scalable models from large corpora quickly and efficiently.
机译:词汇语义的计算模型(例如潜在语义分析)可以根据文本中的统计冗余自动生成单词之间的语义相似性度量。这些措施对于实验性刺激选择和评估模型的认知合理性(作为人们可能用来组织记忆中的意义的机制)很有用。尽管人类会听到大量语音,但是实际的限制限制了许多当前计算模型可以从中学习的数据量。我们继续进行先前的工作,以评估点向互信息的简单度量。通过控制先前工作中的混杂因素,我们证明了该指标得益于对大量数据的培训,并且与一些更复杂的模型的公开实现相比,与人类语义相似性评级的关联更为紧密。我们还提供了一个简单的工具,可以快速有效地从大型语料库构建简单且可扩展的模型。

著录项

  • 来源
    《Behavior Research Methods》 |2009年第3期|647-656|共10页
  • 作者单位

    Cognitive Science Program Indiana University 819 Eigenmann 1910 E. 10th St. 47406-7512 Bloomington IN;

    Cognitive Science Program Indiana University 819 Eigenmann 1910 E. 10th St. 47406-7512 Bloomington IN;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号