首页> 外文期刊>Expert systems with applications >MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality
【24h】

MGFS: A multi-label graph-based feature selection algorithm via PageRank centrality

机译:MGFS:通过PageRank中心的基于多标签图形的特征​​选择算法

获取原文
获取原文并翻译 | 示例

摘要

In multi-label data, each instance corresponds to a set of labels instead of one label whereby the instances belonging to a label in the corresponding column of that label are assigned 1, while instances that do not belong to that label are assigned 0 in the data set. This type of data is usually considered as high-dimensional data, so many methods, using machine learning algorithms, seek to choose the best subset of features for reducing the dimensionality of data and then to create an acceptable model for classification. In this paper, we have designed a fast algorithm for feature selection on the multi-label data using the PageRank algorithm, which is an effective method used to calculate the importance of web pages on the Internet. This algorithm, which is called multi-label graph-based feature selection (MGFS), first constructs an M x L matrix, called Correlation Distance Matrix (CDM), where M is the number of features and L represents the number of class labels. Then, MGFS creates a complete weighted graph, called Feature-Label Graph (FLG), where each feature is considered as a vertex, and the weight between two vertices (or features) represents their Euclidean distance in CDM. Finally, the importance of each graph vertex (or feature) is estimated via the PageRank algorithm. In the proposed method, the number of features can be determined by the user. To prove the performance of the proposed algorithm, we have tested this algorithm with several methods for multi-label feature selection and on several multi-label datasets with different dimensions. The results show the superiority of the proposed method in the classification criteria and run-time. (C) 2019 Elsevier Ltd. All rights reserved.
机译:在多标签数据中,每个实例对应于一组标签而不是一个标签,其中属于该标签的相应列中的标签的实例被分配1,而不属于该标签的实例被分配0数据集。这种类型的数据通常被认为是高维数据,这么多方法,使用机器学习算法,寻求选择最佳的功能子集,用于降低数据的维度,然后为分类创建可接受的模型。在本文中,我们设计了一种快速算法,用于使用PageRank算法对多标签数据的特征选择,这是一种有效的方法,用于计算Internet上的网页的重要性。该算法称为基于多标签图形的特征​​选择(MGFS),首先构建一个M×L矩阵,称为相关距离矩阵(CDM),其中M是特征的数量,L表示类标签的数量。然后,MGFS创建一个完整的加权图形,称为特征标签图(FLG),其中每个特征被认为是顶点,两个顶点(或特征)之间的重量表示它们在CDM中的欧几里德距离。最后,通过PageRank算法估计每个图形顶点(或特征)的重要性。在所提出的方法中,可以由用户确定特征的数量。为了证明所提出的算法的性能,我们已经使用了多种标签特征选择以及具有不同尺寸的多个多标签数据集的多种方法测试了该算法。结果表明了拟议方法在分类标准和运行时的优越性。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号