...
首页> 外文期刊>Concurrency, practice and experience >A distributed parallel algorithm for inferring hierarchical groups from large-scale text corpuses
【24h】

A distributed parallel algorithm for inferring hierarchical groups from large-scale text corpuses

机译:从大型文本语料库推断层次结构组的分布式并行算法

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a distributed parallel algorithm for inferring the hierarchical groups present in arnlarge-scale text corpus. The algorithm is designed to deal with corpuses that typically do not fitrninto the main memory of a workstation computer. The key contribution of this paper lies in itsrnproposal and verification of a parallel distributed algorithm that exploits the advantages of tworncomplementary techniques based on (i) localized modularity optimization and (ii) spectral clustering.rnBased on our experimental observations, these are complementary in the sense that thernformer excels at finding coarse groups in a large-scale network, while the latter demands a heavyrnmemory footprint but is effective in inferring tightly knit fine-grained groups. Empirical evaluationrnof the distributed implementation scheme shows that the algorithm exhibits a significantrnspeed-upwhen compared to existing algorithms like Louvain and, at the same time, produces betterrnquality clusters than either Louvain or spectral clustering algorithms in terms of the F-scorernand Rand index.
机译:我们提出了一种分布式并行算法,用于推断在大型文本语料库中存在的层次结构组。该算法旨在处理通常不适合工作站计算机主存储器的语料库。本文的主要贡献在于对并行分布式算法的建议和验证,该算法利用了(i)局部模块化优化和(ii)频谱聚类两种互补技术的优势。基于我们的实验观察,这些在意义上是互补的Therformer擅长在大型网络中查找粗糙的组,而后者需要较大的内存占用空间,但在推断紧密编织的细粒度组方面非常有效。对分布式实现方案的经验评估表明,与诸如Louvain之类的现有算法相比,该算法显示出显着的速度提升,同时就F-scorernand Rand指数而言,其生成的质量簇优于Louvain或频谱聚类算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号