【24h】

Latent semantic analysis for multiple-type interrelated data objects

机译:多种相互关联的数据对象的潜在语义分析

获取原文

摘要

Co-occurrence data is quite common in many real applications. Latent Semantic Analysis (LSA) has been successfully used to identify semantic relations in such data. However, LSA can only handle a single co-occurrence relationship between two types of objects. In practical applications, there are many cases where multiple types of objects exist and any pair of these objects could have a pairwise co-occurrence relation. All these co-occurrence relations can be exploited to alleviate data sparseness or to represent objects more meaningfully. In this paper, we propose a novel algorithm, M-LSA, which conducts latent semantic analysis by incorporating all pairwise co-occurrences among multiple types of objects. Based on the mutual reinforcement principle, M-LSA identifies the most salient concepts among the co-occurrence data and represents all the objects in a unified semantic space. M-LSA is general and we show that several variants of LSA are special cases of our algorithm. Experiment resultsshow that M-LSA outperforms LSA on multiple applications, including collaborative filtering, text clustering, and text categorization.
机译:共现数据在许多实际应用中非常普遍。潜在语义分析(LSA)已成功用于识别此类数据中的语义关系。但是,LSA仅能处理两种类型的对象之间的单一共现关系。在实际应用中,在许多情况下存在多种类型的对象,并且这些对象中的任何一对都可能具有成对的共现关系。所有这些共现关系都可以用来减轻数据稀疏性或更有意义地表示对象。在本文中,我们提出了一种新颖的算法 M-LSA ,该算法通过合并多种类型对象之间的所有成对共现来进行潜在的语义分析。基于互增强原理,M-LSA识别同现数据中最重要的概念,并在统一语义空间中表示所有对象。 M-LSA是通用的,我们证明了LSA的几种变体是我们算法的特例。实验结果表明,在包括协同过滤,文本聚类和文本分类在内的多个应用程序中,M-LSA的性能优于LSA。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号