首页> 外文会议>International conference on discovery science >Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems
【24h】

Hierarchy Decomposition Pipeline: A Toolbox for Comparison of Model Induction Algorithms on Hierarchical Multi-label Classification Problems

机译:层次分解管道:一个工具箱,用于比较分层多标签分类问题的模型感应算法

获取原文

摘要

Hierarchical multi-label classification (HMC) is a supervised machine learning task, where each example can be assigned more than one label and the possible labels are organized in a hierarchy. HMC problems emerge in domains like functional genomics, habitat modelling, text and image categorization. They can be addressed with global model induction algorithms, which induce a single model that predicts the complete hierarchy, as well as with local algorithms, which induce multiple models that predict different segments of the hierarchy. However, there is no consensus about which of these approaches perform the best over different domains, especially in the setting of learning ensembles. We introduce the hierarchy decomposition pipeline, a publicly available toolbox for comparison of model induction algorithms on HMC problems in an ensemble setting. The pipeline includes five algorithms, including the algorithm that predicts the complete hierarchy, and algorithms that perform partial and complete hierarchy decompositions. One of these algorithms is the novel "label specialization" algorithm that constructs a local multi-label classification model for each parent label in a hierarchy that simultaneously predicts the respective children labels. We apply the pipeline on ten HMC data sets from four domains, which have both tree and directed acyclic graph label hierarchies, and confirm that there is no single best algorithm for all HMC problems. This finding shows that there exists a need for such a pipeline that enables a user to choose the best performing algorithm for his/her HMC data set. Finally, we show that the choice can be narrowed to a specific type of algorithm, based on the characteristics of the label hierarchy and the data set label cardinality.
机译:分层多标签分类(HMC)是一个监督机器学习任务,其中每个示例可以分配多于一个标签,并且可能的标签在层次结构中组织。 HMC问题出现在域名,如功能基因组学,栖息地建模,文本和图像分类。它们可以通过全局模型诱导算法来解决,该算法诱导一个模型,该模型预测完整层次结构以及诱导预测层次结构不同段的多个模型的算法。但是,关于这些方法中的哪一种达成共识,这些方法在不同的域中表现最好,尤其是在学习集合的设置中。我们介绍了一个分层分解管道,一个公开的工具箱,用于比较集合设置中HMC问题的模型诱导算法。管道包括五种算法,包括预测完整层次结构的算法,以及执行部分和完整层次结构分解的算法。这些算法之一是新颖的“标签专业化”算法,其为每个父标签的局部多标签分类模型构成同时预测各个儿童标签的层次结构中。我们在来自四个域的十个HMC数据集上应用管道,这些域具有树和定向的非循环图标签层次结构,并确认所有HMC问题没有单一的最佳算法。该发现表明,需要这样的流水线,其使用户能够为他/她的HMC数据集选择最佳的执行算法。最后,我们表明,基于标签层次结构的特征和数据集标签基数,可以将选择缩小到特定类型的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号