首页> 外文期刊>Journal of classification >MDCGen: Multidimensional Dataset Generator for Clustering
【24h】

MDCGen: Multidimensional Dataset Generator for Clustering

机译:MDCGen:用于聚类的多维数据集生成器

获取原文
获取原文并翻译 | 示例
           

摘要

We present a tool for generating multidimensional synthetic datasets for testing, evaluating, and benchmarking unsupervised classification algorithms. Our proposal fills a gap observed in previous approaches with regard to underlying distributions for the creation of multidimensional clusters. As a novelty, normal and non-normal distributions can be combined for either independently defining values feature by feature (i.e., multivariate distributions) or establishing overall intra-cluster distances. Being highly flexible, parameterizable, and randomizable, MDCGen also implements classic pursued features: (a) customization of cluster-separation, (b) overlap control, (c) addition of outliers and noise, (d) definition of correlated variables and rotations, (e) flexibility for allowing or avoiding isolation constraints per dimension, (f) creation of subspace clusters and subspace outliers, (g) importing arbitrary distributions for the value generation, and (h) dataset quality evaluations, among others. As a result, the proposed tool offers an improved range of potential datasets to perform a more comprehensive testing of clustering algorithms.
机译:我们提出了一种用于生成多维合成数据集的工具,用于测试,评估和基准无监督的分类算法。我们的提案填补了以前关于创建多维集群的基本分布方法的差距。作为一种新颖性,可以组合正常和非正常分布,以便独立定义值通过特征(即多变量分布)或建立整体簇内距离。 MDCGEN还实现高度灵活,可参数化和随机的,MDCGEN还实现了经典的追求功能:(a)群集分离的自定义,(b)重叠控制,(c)增加异常值和噪声,(d)相关变量和旋转的定义, (e)用于允许或避免每个维度的隔离约束的灵活性,(f)子空间集群和子空间异常值,(g)导入价值生成的任意分布,以及(h)数据集质量评估等。结果,该工具提供了改进的潜在数据集,以便对聚类算法进行更全面的测试。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号