首页> 外文会议>International Conference on Knowledge Science, Engineering and Management >A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation
【24h】

A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation

机译:一种基于半约束分层潜在Dirichlet分配的文档聚类算法

获取原文

摘要

The bag-of-words model used for some clustering methods is often unsatisfactory as it ignores the relationship between the important terms that do not cooccur literally. In this paper, a document clustering algorithm based on semi-constrained Hierarchical Latent Dirichlet Allocation (HLDA) is proposed, the frequent itemsets is considered as the input of this algorithm, some keywords are extracted as the prior knowledge from the original corpus and each keyword is associated with an internal node, which is thought as a constrained node and adding constraint to the path sampling processing. Experimental results show that the semi-constrained HLDA algorithm outperforms the LDA, HLDA and semi-constrained LDA algorithms.
机译:用于某种聚类方法的单词袋式模型通常不满意,因为它忽略了在字面意义上不同意的重要术语之间的关系。本文提出了一种基于半约束分层潜在Dirichlet分配(HLDA)的文档聚类算法,频繁的项目集被视为该算法的输入,提取了一些关键字作为来自原始语料库和每个关键字的先前知识与内部节点相关联,该内部节点被认为是约束节点并向路径采样处理添加约束。实验结果表明,半约束的HLDA算法优于LDA,HLDA和半约束LDA算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号