首页> 中文期刊> 《智能系统学报 》 >优化AUC两遍学习算法

优化AUC两遍学习算法

             

摘要

ROC曲线下的面积(简称AUC)是机器学习中一种重要的性能评价准则,广泛应用于类别不平衡学习、代价敏感学习、排序学习等诸多学习任务.由于AUC定义于正负样本之间,传统方法需存储整个数据而不能适用于大数据.为解决大规模问题,前人已提出优化AUC的单遍学习算法,该算法仅需遍历数据一次,通过存储一阶与二阶统计量来进行优化AUC学习.然而在实际应用中,处理二阶统计量依然需要很高的存储与计算开销.为此,本文提出了一种新的优化AUC两遍学习算法TPAUC (two-pass AUC optimization).该算法的基本思想是遍历数据两遍,第一遍扫描数据获得正、负样本的均值,第二遍采用随机梯度下降方法优化AUC.算法的优点在于通过遍历数据两遍来避免存储和计算二阶统计量,从而提高算法的效率,最后本文通过实验说明方法的有效性.%The area under an ROC curve (AUC) has been an important performance index for class-imbalanced learning,cost-sensitive learning,learning to rank,etc.Traditional AUC optimization requires the entire dataset to be stored because AUC is defined as pairs of positive and negative instances.To solve this problem,the one-pass AUC (OPAUC) algorithm was introduced previously to scan the data only once and store the first-and second-order statistics.However,in many real applications,the second-order statistics require high storage and are computationally costly,especially for high-dimensional datasets.We introduce the two-pass AUC (TPAUC) optimization to calculate the mean of positive and negative instances in the first pass and then use the stochastic gradient descent method in the second pass.The new algorithm requires the storage of the first-order statistics but not the second-order statistics;hence,the efficiency is improved.Finally,experiments are used to verify the effectiveness of the proposed algorithm.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号