首页> 外文期刊>ACM transactions on intelligent systems >Exploiting Multilabel Information for Noise-Resilient Feature Selection
【24h】

Exploiting Multilabel Information for Noise-Resilient Feature Selection

机译:利用多标签信息进行抗噪特征选择

获取原文
获取原文并翻译 | 示例

摘要

In a conventional supervised learning paradigm, each data instance is associated with one single class label. Multilabel learning differs in the way that data instances may belong to multiple concepts simultaneously, which naturally appear in a variety of high impact domains, ranging from bioinformatics and information retrieval to multimedia analysis. It targets leveraging the multiple label information of data instances to build a predictive learning model that can classify unlabeled instances into one or multiple predefined target classes. In multilabel learning, even though each instance is associated with a rich set of class labels, the label information could be noisy and incomplete as the labeling process is both time consuming and labor expensive, leading to potential missing annotations or even erroneous annotations. The existence of noisy and missing labels could negatively affect the performance of underlying learning algorithms. More often than not, multilabeled data often has noisy, irrelevant, and redundant features of high dimensionality. The existence of these uninformative features may also deteriorate the predictive power of the learning model due to the curse of dimensionality. Feature selection, as an effective dimensionality reduction technique, has shown to be powerful in preparing high-dimensional data for numerous data mining and machine-learning tasks. However, a vast majority of existing multilabel feature selection algorithms either boil down to solving multiple single-labeled feature selection problems or directly make use of the imperfect labels to guide the selection of representative features. As a result, they may not be able to obtain discriminative features shared across multiple labels. In this article, to bridge the gap between a rich source of multilabel information and its blemish in practical usage, we propose a novel noise-resilient multilabel informed feature selection framework (MIFS) by exploiting the correlations among different labels. In particular, to reduce the negative effects of imperfect label information in obtaining label correlations, we decompose the multilabel information of data instances into a low-dimensional space and then employ the reduced label representation to guide the feature selection phase via a joint sparse regression framework. Empirical studies on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of the proposed MIFS framework.
机译:在常规的有监督学习范例中,每个数据实例都与一个单一的类标签关联。多标签学习的不同之处在于,数据实例可能同时属于多个概念,这些概念自然会出现在从生物信息学和信息检索到多媒体分析的各种高影响领域中。它旨在利用数据实例的多个标签信息来构建预测学习模型,该模型可以将未标记的实例分类为一个或多个预定义的目标类。在多标签学习中,即使每个实例都与一组丰富的类标签相关联,标签信息也可能是嘈杂且不完整的,因为标签过程既耗时又费力,导致潜在的缺少注释甚至错误的注释。噪声标签的存在和缺失可能会对基础学习算法的性能产生负面影响。多标签数据经常具有高维度的嘈杂,不相关和冗余的特征。这些非信息性特征的存在还可能由于维度的诅咒而降低学习模型的预测能力。特征选择作为一种有效的降维技术,已显示出为大量数据挖掘和机器学习任务准备高维数据的强大功能。然而,绝大多数现有的多标签特征选择算法要么归结为解决多个单标签特征选择问题,要么直接利用不完善的标签来指导代表性特征的选择。结果,他们可能无法获得跨多个标签共享的歧视性特征。在本文中,为了弥合丰富的多标签信息源和其在实际应用中的缺陷之间的鸿沟,我们通过利用不同标签之间的相关性,提出了一种新颖的抗噪多标签信息特征选择框架(MIFS)。特别是,为了减少不完美标签信息在获得标签相关性方面的负面影响,我们将数据实例的多标签信息分解为一个低维空间,然后通过联合稀疏回归框架采用简化后的标签表示方法来指导特征选择阶段。对合成数据集和实际数据集的实证研究表明,提出的MIFS框架的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号