A performance study of the chain sampling algorithm

机译：链采样算法的性能研究

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

On-line data stream analysis is an important challenge today because of the always-increasing rates of the streams issued from multiple heterogeneous sources, in many application domains. To reduce the amount of the data stream, several sampling methods were designed by the data stream research community. We focus in this paper, on the chain sampling algorithm proposed by Babcock et al. The aim of this algorithm is to select randomly and at any time, a given fixed proportion from the most recent items of the stream contained in the last sliding window. This algorithm is well adapted to the stream context, as only one pass over the data is performed. Moreover it uses a small memory, as it does not store all the items of the current sliding window. We show in this paper that the chain sampling algorithm suffers from some collision or redundancy problems. The collision occurs when the same item is selected as a sample more than once during the execution of the algorithm. We propose two approaches to overcome this weakness and improve the chain sampling algorithm. The first one is called “inverting the selection for a high sampling rate” and the second one is inspired from the “divide to conquer strategy”. Different experimentations are performed to show the efficiency of these two improvements, in particular their impact on the execution time of the algorithm.

机译：由于在许多应用领域中从多个异构源发出的数据流的比率一直在不断增长，因此在线数据流分析是当今的一项重要挑战。为了减少数据流的数量，数据流研究社区设计了几种采样方法。在本文中，我们将重点放在Babcock等人提出的链采样算法上。该算法的目的是随时随地从最后一个滑动窗口中包含的流的最新项中随机选择给定的固定比例。该算法非常适合流上下文，因为仅执行一次数据传递。此外，由于它不存储当前滑动窗口的所有项目，因此它使用的内存很小。我们在本文中证明链采样算法存在一些冲突或冗余问题。在算法执行过程中多次选择同一项目作为样本时发生冲突。我们提出了两种方法来克服此缺点并改进链采样算法。第一个称为“以高采样率反转选择”，第二个则受“分而治之”的启发。进行不同的实验以显示这两个改进的效率，尤其是它们对算法执行时间的影响。

著录项

来源
《2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems》|2015年|487-494|共8页
会议地点 Cairo(EG)
作者
Rayane El Sibai; Yousra Chabchoub; Jacques Demerjian; Zakia Kazi-Aoul; Kabalan Barbar;
展开▼
作者单位

LISITE Laboratory, ISEP, Paris, France;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
chain sampling algorithm; data stream; data stream sampling; on-line analysis; sliding window;

机译：链采样算法数据流数据流采样在线分析滑动窗口;
入库时间 2022-08-26 13:53:57

相似文献

外文文献
中文文献
专利

1. Generalization performance of least-square regularized regression algorithm with Markov chain samples [J] . Zou B., Li L., Xu Z. Journal of Mathematical Analysis and Applications . 2012,第1期

机译：马尔可夫链样本的最小二乘正则回归算法的广义性能
2. Generalization performance of least-square regularized regression algorithm with Markov chain samples [J] . Zou B., Li L., Xu Z. Journal of Mathematical Analysis and Applications . 2012,第1期

机译：马尔可夫链样本的最小二乘正则回归算法的广义性能
3. Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study [J] . KanthidaKusonmano, MichaelNetzer, ChristianBaumgartner, ScientificWorldJournal . 2012,第3期

机译：汇集样品对分类算法性能的影响：比较研究
4. A performance study of the chain sampling algorithm [C] . Rayane El Sibai, Yousra Chabchoub, Jacques Demerjian, IEEE International Conference on Intelligent Computing and Information Systems . 2015

机译：链采样算法的性能研究
5. Performance Prediction of Hyperspectral Target Detection Algorithms via Importance Sampling [D] . Loughlin, Cooper. 2018

机译：基于重要性采样的高光谱目标检测算法性能预测
6. Effects of Pooling Samples on the Performance of Classification Algorithms: A Comparative Study [O] . Kanthida Kusonmano, Michael Netzer, Christian Baumgartner, 2012

机译：混合样本对分类算法性能的影响：比较研究
7. The sampling properties of some distance geometry algorithms applied to unconstrained polypeptide chains: A study of 1830 independently computed conformations [O] . Timothy F. Havel 1990

机译：一些距离几何算法的采样特性应用于无约束多肽链：1830年独立计算构象的研究

A performance study of the chain sampling algorithm

摘要

著录项

相似文献

相关主题

期刊订阅