首页> 外文会议>ISCA International Conference on Computer and Their Applications >A WORD-BASED SOFT CLUSTERING ALGORITHM FOR DOCUMENTS
【24h】

A WORD-BASED SOFT CLUSTERING ALGORITHM FOR DOCUMENTS

机译:基于词的文档软簇算法

获取原文

摘要

Document clustering is an important tool for applications such as Web search engines. It enables the user to have a good overall view of the information contained in the documents. However, existing algorithms suffer from various aspects; hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose WBSC (Word-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. WBSC uses a hierarchical approach to cluster documents having similar words. WBSC is very effective and efficient when compared with existing hard clustering algorithms like K-means and its variants.
机译:文档群集是Web Search引擎等应用程序的重要工具。它使用户能够良好的整体视图文档中包含的信息。然而,现有算法遭受各个方面;硬群算法(其中每个文档属于一个群集)无法检测文档的多个主题,而软群算法(每个文档可以属于多个集群)通常是低效的。我们提出了基于给定相似度量的基于WBSC(基于词的软群),一种有效的软聚类算法。 WBSC使用具有类似单词的群集文档的分层方法。与k-means及其变体等现有的硬群算法相比,WBSC非常有效和有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号