【24h】

Semantic based clustering of Web documents

机译:Web文档的基于语义的群集

获取原文

摘要

A new methodology that structures the semantics of a collection of documents into the geometry of a simplicial complex is developed: a primitive concept is represented by a top dimension simplex, and a connected component represents a concept. Based on these structures, documents can be clustered into some meaningful classes. Experiments with three different data sets from web pages and medical literature have shown that the proposed unsupervised clustering approach performs significantly better than traditional clustering algorithms, such as k-means, AutoClass and hierarchical clustering (HAC). This abstract geometric model seems have captured the intrinsic semantics of the documents.
机译:开发了一种将文档集合的语义结构化为简单复合体的几何结构的新方法:原始概念由顶级单形表示,而连接的组件表示概念。基于这些结构,文档可以分为一些有意义的类。从网页和医学文献中对三种不同数据集进行的实验表明,所提出的无监督聚类方法的性能明显优于传统聚类算法,例如k-means,AutoClass和分层聚类(HAC)。这个抽象的几何模型似乎已经捕获了文档的固有语义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号