Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases

机译：并行计算权变统计：设计权衡和极限案例

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy, and c2 independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel.We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.

机译：统计分析通常用于降低数据的维度和推断意义。任何针对大规模的统计分析包的关键挑战是解决并行可扩展性和数值稳定性的正交问题。许多统计技术，例如描述性统计或主成分分析，基于时刻和共同的矩，并且可以使用鲁棒在线更新公式，可以以令人尴尬的平行方式计算，可用于地图 - 减少样式实现。在本文中，我们专注于应急表，通过哪些导出的统计数据，如关节和边际概率，点亮互信息，信息熵和C2独立统计数据可以直接获得。但是，随着数据大小的增加，应急表可能变大，需要相应大量的处理器之间的通信。这种通信的潜在增加可防止最佳并行加速，并且是与基于时刻的统计数据（我们在[1]中讨论的主要区别，其中处理器间通信的量无关。在这里，我们介绍了我们在并行实施了对偶然表的计算的设计权衡。我们还研究了我们开源实现的并行加速和可扩展性属性。特别是，当在适当的上下文中使用累积情况统计时，我们观察到最佳加速和可扩展性，即，当数据输入不是准漫反射时。

著录项

来源
《2010 IEEE International Conference on Cluster Computing》|2010年|156-165|共10页
会议地点
作者
Pebay Philippe; Thompson David; Bennett Janine;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类分子生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Distributed Generation Placement Design and Contingency Analysis with Parallel Computing Technology [J] . Wenzhong Gao, Xi Chen Journal of Computers . 2009,第4期

机译：与并行计算技术的分布式发电放置设计与应急分析
2. Parallel Contingency Analysis for Multi-CPU/Core Computing Environment [J] . Mike Zhou, Donghao Feng IFAC PapersOnLine . 2019,第4期

机译：多CPU /核心计算环境的并行权变分析
3. Parallel Contingency Analysis for Multi-CPU/Core Computing Environment [J] . Mike Zhou, Donghao Feng IFAC PapersOnLine . 2019,第4期

机译：多CPU /核心计算环境的并行权变分析
4. Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases [C] . Pebay Philippe, Thompson David, Bennett Janine 2010 IEEE International Conference on Cluster Computing . 2010

机译：并行计算权变统计：设计权衡和极限案例
5. Online Controlled Experiment Design: Trade-off Between Statistical Uncertainty and Cumulative Reward. [D] . Dai, Liang. 2014

机译：在线控制实验设计：在统计不确定性和累积奖励之间进行权衡。
6. Local SAR global SAR transmitter power and excitation accuracy trade-offs in low flip-angle parallel transmit pulse design [O] . Bastien Guérin, Matthias Gebhardt, Steven Cauley, -1

机译：低翻转角并行发射脉冲设计中的局部SAR全局SAR发射机功率和激励精度的折衷
7. Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases [O] . David Thompson, Janine Bennett 2012

机译：并行计算权变统计：设计权衡和极限案例

Computing Contingency Statistics in Parallel: Design Trade-Offs and Limiting Cases

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅