首页> 外文期刊>Concurrency and computation: practice and experience >Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents
【24h】

Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents

机译:Web文档分层聚类的基于图割的算法的并行化

获取原文
获取原文并翻译 | 示例

摘要

We propose a parallelization scheme for an existing algorithm for constructing a web-directory, that containsrncategories of web documents organized hierarchically. The clustering algorithm automatically infers thernnumber of clusters using a quality function based on graph cuts. A parallel implementation of the algorithmrnhas been developed to run on a cluster of multi-core processors interconnected by an intranet. The effect ofrnthe well-known Latent Semantic Indexing on the performance of the clustering algorithm is also considered.rnThe parallelized graph-cut based clustering algorithm achieves an F-measure in the range OE0:69; 0:91u0002rnfor the generated leaf-level clusters while yielding a precision-recall performance in the range OE0:66; 0:84u0002rnfor the entire hierarchy of the generated clusters. As measured via empirical observations, the parallelrnalgorithm achieves an average speedup of 7.38 over its sequential variant, at the same time yielding a betterrnclustering performance than the sequential algorithm in terms of F-measure.
机译:我们提出了一种用于构建网络目录的现有算法的并行化方案,该目录包含分层组织的Web文档的类别。聚类算法使用基于图割的质量函数自动推断聚类的数量。已经开发了该算法的并行实现,以在通过内部网互连的多核处理器集群上运行。还考虑了著名的潜在语义索引对聚类算法性能的影响。并行基于图割的聚类算法实现了OE0:69范围内的F度量; 0:91u0002rn用于生成的叶级群集,同时产生精度调用性能,范围为OE0:66; 0:84u0002rn对于生成的群集的整个层次结构。通过经验观察测得,并行算法在其顺序变量上的平均速度提高了7.38,同时在F度量方面比顺序算法具有更好的群集性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号