首页> 美国政府科技报告 >Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus
【24h】

Comparison of Human and Latent Semantic Analysis (LSA) Judgements of Pairwise Document Similarities for a News Corpus

机译:新闻语料库中两两文档相似度的人类和潜在语义分析(Lsa)判断的比较

获取原文

摘要

Pairwise similarity judgement correlations between humans and Latent Semantic Analysis (LSA) were explored on a set of 50 news documents. LSA is a modern and commonly used technique for automatic determination of document similarity. LSA users must choose local and global weighting schemes, the number of factors to be retained, stop word lists and whether to background. Global weighting schemes had more effect than local weighting schemes. Use of a stop word list almost always improved performance. Introduction of a background set of similar documents increased larger correlations and reduced smaller ones The correlations ranged between approximately 0 and 0.6 depending on the LSA settings indicating the importance of correct settings The low maximum correlation indicates that information presentation schemes based on LSA may often be at variance with visualisations based on human decisions even using the best settings for a data set.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号