首页> 外文会议>World Multi-Conference on Systemics, Cybernetics and Informatics >Data Distribution Assessment and Optimal Splitting of Data Sets
【24h】

Data Distribution Assessment and Optimal Splitting of Data Sets

机译:数据分布评估和数据集的最佳分裂

获取原文

摘要

A new method for assessing the quality of a data distribution based on the calculation of the Kullback-Leibler (KL) divergence is proposed. The pdf of the data is estimated by a kernel density estimator. In the case without any prior knowledge the target distribution is assumed to be uniform. Then Monte Carlo sampling of the estimated pdf allows to approximate the KL divergence as criterion for the space-filling properties of the data distribution. Applications of this KL-based criterion are manifold. Sobol sequences and maximin latin hypercubes, most frequently applied for space-filling design of experiments, are compared. Finally, strategies for optimally splitting data sets are discussed and illustrated.
机译:提出了一种基于计算Kullback-Leibler(KL)发散的计算的数据分布质量的新方法。通过内核密度估计器估计数据的PDF。在没有任何先前知识的情况下,假设目标分布是均匀的。然后,估计的PDF的Monte Carlo采样允许将KL发散视为数据分布的空间填充特性的标准。基于KL的标准的应用是歧管。比较SOBOL序列和Maximin Latin超机,最常应用于实验的空间填充设计。最后,讨论和说明了最佳分割数据集的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号