首页> 外文期刊>Entropy >How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?
【24h】

How Is a Data-Driven Approach Better than Random Choice in Label Space Division for Multi-Label Classification?

机译:在多标签分类的标签空间划分中,数据驱动的方法如何比随机选择更好?

获取原文
           

摘要

We propose using five data-driven community detection approaches from social networks to partition the label space in the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RA k EL d . We evaluate modularity-maximizing using fast greedy and leading eigenvector approximations, infomap, walktrap and label propagation algorithms. For this purpose, we propose to construct a label co-occurrence graph (both weighted and unweighted versions) based on training data and perform community detection to partition the label set. Then, each partition constitutes a label space for separate multi-label classification sub-problems. As a result, we obtain an ensemble of multi-label classifiers that jointly covers the whole label space. Based on the binary relevance and label powerset classification methods, we compare community detection methods to label space divisions against random baselines on 12 benchmark datasets over five evaluation measures. We discover that data-driven approaches are more efficient and more likely to outperform RA k EL d than binary relevance or label powerset is, in every evaluated measure. For all measures, apart from Hamming loss, data-driven approaches are significantly better than RA k EL d ( α = 0 . 05 ), and at least one data-driven approach is more likely to outperform RA k EL d than a priori methods in the case of RA k EL d ’s best performance. This is the largest RA k EL d evaluation published to date with 250 samplings per value for 10 values of RA k EL d parameter k on 12 datasets published to date.
机译:我们建议使用五种来自社交网络的数据驱动的社区检测方法来在多标签分类任务中对标签空间进行分区,以替代将随机分区划分为相等的子集(如由RA k EL d执行)。我们使用快速贪婪和领先的特征向量逼近,信息图,助行器和标签传播算法来评估模块化最大化。为此,我们建议基于训练数据构造标签共现图(加权版本和未加权版本),并执行社区检测以划分标签集。然后,每个分区构成用于单独的多标签分类子问题的标签空间。结果,我们获得了一个多标签分类器的集合,它们共同覆盖了整个标签空间。基于二元相关性和标签功率集分类方法,我们比较了社区检测方法,以五种评估措施对12个基准数据集上的随机基线进行标签空间划分。我们发现,在每种评估的指标中,数据驱动的方法比二进制相关性或标签功率集更有效,并且更可能胜过RA k EL d。对于所有度量,除汉明损失外,数据驱动的方法明显优于RA k EL d(α= 0. 05),并且至少一种数据驱动的方法比先验方法更有可能胜过RA k EL d以RA k EL d的最佳性能为例。这是迄今为止发布的最大的RA k EL d评估,在迄今为止发布的12个数据集上,每个值有250个采样,用于RA k EL d参数k的10个值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号