首页> 外文会议>IEEE International Conference on Semantic Computing >Semantic-Based Text Document Clustering Using Cognitive Semantic Learning and Graph Theory
【24h】

Semantic-Based Text Document Clustering Using Cognitive Semantic Learning and Graph Theory

机译:基于认知语义学习和图论的基于语义的文本文档聚类

获取原文

摘要

Semantic-based text document clustering aims to group documents into a set of topic clusters. We propose a new approach for semantically clustering of text documents based on cognitive science and graph theory. We apply a computational cognitive model of semantic association for human semantic memory, known as Incremental Construction of an Associative Network (ICAN). The vector-based model of Latent Semantic Analysis (LSA), has been a leading computational cognitive model for semantic learning and topic modeling, but it has well-known limitations including not considering the original word-order and doing semantic reduction with neither a linguistic nor cognitive basis. These issues have been overcome by the ICAN model. ICAN model is used to generate document-level semantic-graphs. Cognitive and graph-based topic and context identification methods are used for semantic reduction of ICAN graphs. A corpus-graph is generated from ICAN graphs, and then a community-detection graph algorithm is applied for the final step of document clustering. Experiments are conducted on three commonly used datasets. Using the purity and entropy criteria for clustering quality, our results show a notable outperformance over the LSA-based approach.
机译:基于语义的文本文档聚类旨在将文档分组为一组主题聚类。我们提出了一种基于认知科学和图论的文本文档语义聚类的新方法。我们将语义关联的计算认知模型应用于人类语义记忆,称为关联网络的增量构造(ICAN)。基于向量的潜在语义分析(LSA)模型一直是用于语义学习和主题建模的领先计算认知模型,但它具有众所周知的局限性,包括不考虑原始单词顺序和不使用语言学就进行语义约简也没有认知基础。这些问题已由ICAN模型克服。 ICAN模型用于生成文档级语义图。基于认知和基于图的主题和上下文识别方法用于ICAN图的语义归约。从ICAN图生成一个语料图,然后将社区检测图算法应用于文档聚类的最后一步。在三个常用数据集上进行了实验。使用纯度和熵准则进行聚类质量分析,我们的结果表明,与基于LSA的方法相比,其性能明显优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号