...
首页> 外文期刊>Expert Systems with Application >ProTraS: A probabilistic traversing sampling algorithm
【24h】

ProTraS: A probabilistic traversing sampling algorithm

机译:ProTraS:一种概率遍历采样算法

获取原文
获取原文并翻译 | 示例

摘要

In the process of knowledge discovery in big data, sampling is a technological brick that can be included in a more general framework to speed up existing algorithms and contribute to the scalability issue. Two challenging and connected problems arise with complexity: tuning and timing. ProTraS(1) is a new algorithm that fulfills both requirements. It is driven by a unique parameter, the sampling cost. The cost is overestimated by the maximum within group distance and the group cardinality. It is an iterative algorithm, at each step a new representative is added, chosen as the farthest-first traversal item from the representative in the group with the highest probability of cost reduction. The novel algorithm is robust to noise and time optimized. A detailed comparison with alternative algorithms, conducted on various synthetic and real world data sets, shows that the proposal yields competitive results in terms of quality of representation for clustering, sampling size and sampling time. (C) 2018 Elsevier Ltd. All rights reserved.
机译:在大数据的知识发现过程中,采样是一种技术积木,可以将其包含在更通用的框架中,以加快现有算法的速度并解决可伸缩性问题。复杂性产生两个具有挑战性和相互联系的问题:调整和计时。 ProTraS(1)是可以同时满足这两个要求的新算法。它由唯一参数(采样成本)驱动。成本被组距离内的最大值和组基数高估。它是一种迭代算法,在每个步骤中都会添加一个新的代表,该代表被选为成本降低可能​​性最高的组中距离代表最远的第一个遍历项。新算法对噪声和时间优化具有鲁棒性。与在各种合成和现实世界数据集上进行的替代算法的详细比较表明,该建议在聚类表示质量,采样大小和采样时间方面产生了有竞争力的结果。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号