首页> 外文期刊>Electronic Notes in Theoretical Computer Science >A Framework to Generate Synthetic Multi-label Datasets
【24h】

A Framework to Generate Synthetic Multi-label Datasets

机译:生成合成多标签数据集的框架

获取原文
           

摘要

A controlled environment based on known properties of the dataset used by a learning algorithm is useful to empirically evaluate machine learning algorithms. Synthetic (artificial) datasets are used for this purpose. Although there are publicly available frameworks to generate synthetic single-label datasets, this is not the case for multi-label datasets, in which each instance is associated with a set of labels usually correlated. This work presentsMldatagen, a multi-label dataset generator framework we have implemented, which is publicly available to the community. Currently, two strategies have been implemented inMldatagen: hypersphere and hypercube. For each label in the multi-label dataset, these strategies randomly generate a geometric shape (hypersphere or hypercube), which is populated with points (instances) randomly generated. Afterwards, each instance is labeled according to the shapes it belongs to, which defines its multi-label. Experiments with a multi-label classification algorithm in six synthetic datasets illustrate the use ofMldatagen.
机译:基于学习算法使用的数据集的已知属性的受控环境可用于根据经验评估机器学习算法。合成(人工)数据集用于此目的。尽管存在公开生成合成单标签数据集的框架,但对于多标签数据集却不是这种情况,在多标签数据集中,每个实例都与通常相关的一组标签相关联。这项工作介绍了Mldatagen,这是我们已实现的多标签数据集生成器框架,可向社区公开使用。当前,Mldatagen中已实现两种策略:超球面和超立方体。对于多标签数据集中的每个标签,这些策略会随机生成一个几何形状(超球面或超立方体),并填充有随机生成的点(实例)。之后,根据每个实例所属的形状对其进行标记,从而定义其多标签。在六个合成数据集中使用多标签分类算法进行的实验说明了Mldatagen的使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号