首页> 外文期刊>Journal of the American Society for Information Science >Discovering Latent Topical Structure by Second-Order Similarity Analysis
【24h】

Discovering Latent Topical Structure by Second-Order Similarity Analysis

机译:通过二阶相似度分析发现潜在的话题结构

获取原文
获取原文并翻译 | 示例
           

摘要

Computing document similarity directly from a "bag of words" vector space model can be problematic because term independence causes the relationships between synonymous terms and the contextual influences that determine the sense of polysemous terms to be ignored. This study compares two methods that potentially address these problems by deriving the higher order relationships that lie latent within the original first-order space. The first is latent semantic analysis (LSA), a dimension reduction method that is a well-known means of addressing the vocabulary mismatch problem in information retrieval systems. The second is the lesser known yet conceptually simple approach of second-order similarity (SOS) analysis, whereby latent similarity is measured in terms of mutual first-order similarity. Nearest neighbour tests show that SOS analysis derives similarity models that are superior to both first-order and LSA-derived models at both coarse and fine levels of semantic granularity. SOS analysis has been criticized for its computational complexity. A second contribution is the novel application of vector truncation to reduce runtime by a constant factor. Speed-ups of 4 to 10 times are achievable without compromising the structural gains achieved by full-vector SOS analysis.
机译:直接从“词袋”向量空间模型计算文档相似度可能会出现问题,因为术语独立性会导致同义术语与确定多义术语含义的上下文影响之间的关系。这项研究比较了两种方法,它们通过推导潜在的原始一阶空间内的高阶关系来潜在地解决这些问题。首先是潜在语义分析(LSA),这是一种降维方法,是解决信息检索系统中词汇不匹配问题的众所周知的方法。第二种是鲜为人知的概念上比较简单的二阶相似度(SOS)分析方法,其中潜在的相似度是根据相互的一阶相似度来衡量的。最近的邻居测试表明,在语义粒度的粗略和精细级别上,SOS分析得出的相似性模型均优于一阶模型和LSA衍生模型。 SOS分析因其计算复杂性而受到批评。第二个贡献是向量截断的新颖应用,可将运行时间减少一个恒定因子。在不影响通过全向量SOS分析获得的结构增益的情况下,可以实现4到10倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号