首页> 外文OA文献 >Supervised feature ranking using a genetic algorithm optimized artificial neural network
【2h】

Supervised feature ranking using a genetic algorithm optimized artificial neural network

机译:使用遗传算法优化的人工神经网络进行监督特征排序

摘要

[[abstract]]A genetic algorithm optimized artificial neural network GNW has been designed to rank features for two diversified multivariate data sets. The dimensions of these data sets are 85 x 24 and 62 x 25 for 24 or 25 molecular descriptors being computed for 85 matrix metalloproteinase-1 inhibitors or 62 hepatitis C virus NS3 protease inhibitors, respectively. Each molecular descriptor computed is treated as a feature and input into an input layer node of the artificial neural network. To optimize the artificial neural network by the genetic algorithm, each interconnected weight between input and hidden or between hidden and output layer nodes is binary encoded as a 16 bits string in a chromosome, and the chromosome is evolved by crossover and mutation operations. Each input layer node and its associated weights of the trained GNW are systematically omitted once (the self-depleted weights), and the corresponding weight adjustments due to the omission are computed to keep the overall network behavior unchanged. The primary feature ranking index defined as the sum of self-depleted weights and the corresponding weight adjustments computed is found capable of separating good from bad features for some artificial data sets of known feature rankings tested. The final feature indexes used to rank the data sets are computed as a sum of the weighted frequency of each feature being ranked in a particular rank for each data set being partitioned into numerous clusters. The two data sets are also clustered by a standard K-means method and trained by a support vector machine (SVM) for feature ranking using the computed F-scores as feature ranking index. It is found that GNW outperforms the SVM method on three artificial as well as the matrix metalloproteinase-1 inhibitor data sets studied. A clear-cut separation of good from bad features is offered by the GNW but not by the SVM method for a feature pool of known feature ranking.
机译:[[摘要]已设计了一种遗传算法优化的人工神经网络GNW来对两个多样化的多元数据集的特征进行排序。对于分别针对85种基质金属蛋白酶-1抑制剂或62种丙型肝炎病毒NS3蛋白酶抑制剂计算的24或25个分子描述符,这些数据集的尺寸分别为85 x 24和62 x 25。计算出的每个分子描述符都被当作一个特征,并输入到人工神经网络的输入层节点中。为了通过遗传算法优化人工神经网络,将输入层和隐藏层之间或隐藏层和输出层节点之间的每个相互关联的权重二进制编码为一条染色体中的16位字符串,并通过交叉和变异操作来进化该染色体。训练后的GNW的每个输入层节点及其关联的权重都被系统地省略一次(自耗权重),并且由于遗漏而导致的相应权重调整被计算为使整个网络行为保持不变。对于被测试的已知特征等级的一些人工数据集,发现了被定义为自耗权重和计算出的相应权重调整之和的主要特征等级索引能够将好特征与坏特征分开。用于对数据集进行排名的最终特征索引被计算为,在将每个数据集划分为多个群集的特定排名中,每个特征的加权频率之和。这两个数据集也通过标准K均值方法进行聚类,并由支持向量机(SVM)进行训练,以使用计算出的F分数作为特征排名指标进行特征排名。发现在研究的三个人工以及基质金属蛋白酶-1抑制剂数据集上,GNW均优于SVM方法。 GNW提供了良好特征与不良特征的明确区分,但是对于已知特征等级的特征池,SVM方法没有提供。

著录项

  • 作者

    Lin TH;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 [[iso]]en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号