首页> 外文期刊>Algorithmica >Conquest: A Coarse-Grained Algorithm for Constructing Summaries of Distributed Discrete Datasets
【24h】

Conquest: A Coarse-Grained Algorithm for Constructing Summaries of Distributed Discrete Datasets

机译:征服:用于构造分布式离散数据集摘要的粗粒度算法

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present a coarse-grained parallel algorithm, CONQUEST, for constructing bounded-error summaries of high-dimensional binary attributed data in a distributed environment. Such summaries enable more expensive analysis techniques to be applied efficiently under constraints on computation, communication, and privacy with little loss in accuracy. While the discrete and high-dimensional nature of the dataset makes the problem difficult in its serial formulation, the loose-coupling of distributed servers hosting the data and the heterogeneity in network bandwidth present additional challenges. CONQUEST is based on a novel linear algebraic tool, PROXIMUS, which is shown to be highly effective on a serial platform. In contrast to traditional fine-grained parallel techniques that distribute the kernel operations, CONQUEST adopts a coarsegrained parallel formulation that relies on the principle of sampling to reduce communication overhead while maintaining high accuracy. Specifically, each individual site computes its local patterns independently. Various sites cooperate in dynamically orchestrated work groups to construct consensus patterns from these local patterns. Individual sites may then decide to continue their participation in the consensus or leave the group. Such parallel formulation implicitly resolves load-balancing and privacy issues while reducing communication volume significantly. Experimental results on an Intel Xeon cluster demonstrate that this strategy is capable of excellent performance in terms of compression time, ratio, and accuracy with respect to post-processing tasks.
机译:在本文中,我们提出了一种粗粒度并行算法CONQUEST,用于在分布式环境中构造高维二进制属性数据的有界误差摘要。这样的总结使更昂贵的分析技术可以在计算,通信和隐私的约束下有效地应用,而准确性损失很小。尽管数据集的离散性和高维性质使其难以以串行方式表示,但托管数据的分布式服务器的松散耦合和网络带宽的异质性带来了其他挑战。 CONQUEST基于一种新颖的线性代数工具PROXIMUS,该工具在串行平台上非常有效。与分布内核操作的传统细粒度并行技术相比,CONQUEST采用了一种粗粒度并行公式,该公式依靠采样原理来减少通信开销,同时保持高精度。具体来说,每个站点都独立地计算其本地模式。各个站点在动态协调的工作组中合作,以从这些本地模式构建共识模式。然后,各个站点可以决定继续参与共识或离开该组。这种并行表述隐式解决了负载平衡和隐私问题,同时大大减少了通信量。在英特尔®至强®集群上的实验结果表明,该策略在压缩时间,压缩比和后处理任务的准确性方面均具有出色的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号