Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents

Karthick Seshadri; S. Mercy Shalinie

首页> 外文期刊>Concurrency and computation: practice and experience >Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents

【24h】

Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents

机译：Web文档分层聚类的基于图割的算法的并行化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a parallelization scheme for an existing algorithm for constructing a web-directory, that containsrncategories of web documents organized hierarchically. The clustering algorithm automatically infers thernnumber of clusters using a quality function based on graph cuts. A parallel implementation of the algorithmrnhas been developed to run on a cluster of multi-core processors interconnected by an intranet. The effect ofrnthe well-known Latent Semantic Indexing on the performance of the clustering algorithm is also considered.rnThe parallelized graph-cut based clustering algorithm achieves an F-measure in the range OE0:69; 0:91u0002rnfor the generated leaf-level clusters while yielding a precision-recall performance in the range OE0:66; 0:84u0002rnfor the entire hierarchy of the generated clusters. As measured via empirical observations, the parallelrnalgorithm achieves an average speedup of 7.38 over its sequential variant, at the same time yielding a betterrnclustering performance than the sequential algorithm in terms of F-measure.

机译：我们提出了一种用于构建网络目录的现有算法的并行化方案，该目录包含分层组织的Web文档的类别。聚类算法使用基于图割的质量函数自动推断聚类的数量。已经开发了该算法的并行实现，以在通过内部网互连的多核处理器集群上运行。还考虑了著名的潜在语义索引对聚类算法性能的影响。并行基于图割的聚类算法实现了OE0：69范围内的F度量； 0：91u0002rn用于生成的叶级群集，同时产生精度调用性能，范围为OE0：66； 0：84u0002rn对于生成的群集的整个层次结构。通过经验观察测得，并行算法在其顺序变量上的平均速度提高了7.38，同时在F度量方面比顺序算法具有更好的群集性能。

著录项

来源
《Concurrency and computation: practice and experience》 |2015年第17期|5156-5176|共21页
作者
Karthick Seshadri; S. Mercy Shalinie;
展开▼
作者单位

Department of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai - 625015,Tamil Nadu, India;

Department of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai - 625015,Tamil Nadu, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
parallel hierarchical clustering; latent semantic indexing; cut-trees; singular value decomposition; text mining;

机译：并行层次聚类;潜在语义索引;剪切树;奇异值分解;文本挖掘;

相似文献

外文文献
中文文献
专利

1. Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis [J] . Karthick Seshadri, K. Viswanathan Iyer, Mercy Shalinie S Concurrency, practice and experience . 2019,第13期

机译：基于层次化潜在语义分析的并行文档聚类算法设计与评估
2. A Novel Parallel Algorithm for Clustering Documents Based on the Hierarchical Agglomerative Approach [J] . Amal Elsayed Aboutabl, Mohamed Nour Elsayed International Journal of Computer Science & Information Technology (IJCSIT) . 2011,第2期

机译：基于层次聚类的并行文档聚类新算法
3. A Parallel Hybrid Web Document Clustering Algorithm and its Performance Study [J] . SHUTING XU, JUN ZHANG Journal of supercomputing . 2004,第2期

机译：并行混合Web文档聚类算法及其性能研究
4. A term-based algorithm for hierarchical clustering of Web documents [C] . Schenker A., Last M., Kandel A. IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9th . 2001

机译：Web文档分层聚类的基于术语的算法
5. A hierarchical document clustering algorithm. [D] . Gao, Weizheng. 2004

机译：分层文档聚类算法。
6. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets [O] . D. D. Shrimankar, S. R. Sathe 2016

机译：大型生物数据集基于新图块的并行编程模型对SMP节点和工作站集群的并行算法进行分析
7. Exploiting Parallelism in Query Processing for Web Document Search Using Shared-Memory and Cluster-Based Architectures [O] . Amal Elsayed Aboutabl 2013

机译：利用共享内存和基于群集的体系结构在Web文档搜索的查询处理中利用并行性
8. Evaluation of Hierarchical Clustering Algorithms for Document Datasets. [R] . Zhao, Y., Karypis, G. 2002

机译：文档数据集的层次聚类算法评估。

Parallelization of a graph-cut based algorithm for hierarchical clustering of web documents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅