首页> 外文会议>2012 Fourth International Symposium on Information Science and Engineering. >An Improved Method for Combination Feature Selection in Web Click-Through Data Mining
【24h】

An Improved Method for Combination Feature Selection in Web Click-Through Data Mining

机译:Web点击数据挖掘中组合特征选择的一种改进方法

获取原文
获取原文并翻译 | 示例

摘要

An important way to analyze the web click-through data is to build up a 2-class linear classifier, and select a key subset from user's features which mainly decided the hit result. But in many circumstances, the fitting accuracy is not good as the model only considers original features. We often add combination features which are products of the original features to the classifier model to improve the accuracy. Meanwhile, the combination features cause a serious problem. They dramatically increase the number of features, which is called "feature dimension explosion". Traditional algorithms can hardly afford this because they need to input all the features at the beginning of processing. Grafting method provides an incremental way to solve the problem, which only adds one feature at a time. However, Grafting method has very low efficiency when the dimension of the feature space is huge and sparse. In this paper, we propose an adaptive Grafting algorithm and PV filter method to solve the feature dimension explosion problem. Our algorithm significantly improves the computational efficiency by educing the steps of model optimizing, and reduces the scale of feature space by applying a very simple filter strategy to make the algorithm work effectively. Our experiments on real data show that we can easily generate and select ombination features by using the adaptive Grafting algorithm and PV filter method, which significantly raises the fitting accuracy of the model.
机译:分析网络点击数据的一种重要方法是建立一个2级线性分类器,然后从用户特征中选择一个主要决定点击结果的关键子集。但是在很多情况下,由于模型仅考虑原始特征,因此拟合精度不高。我们经常在分类器模型中添加原始特征产品的组合特征,以提高准确性。同时,组合特征引起严重的问题。它们极大地增加了特征的数量,这被称为“特征维爆炸”。传统算法几乎负担不起,因为它们需要在处理开始时输入所有特征。嫁接方法提供了一种解决问题的增量方式,一次只能添加一个功能。但是,当特征空间的尺寸很大且稀疏时,嫁接方法的效率很低。本文提出了一种自适应嫁接算法和PV滤波方法来解决特征维爆炸问题。我们的算法通过减少模型优化步骤来显着提高计算效率,并通过应用非常简单的过滤策略来使算法有效工作,从而减少了特征空间的规模。我们在真实数据上的实验表明,通过使用自适应嫁接算法和PV滤波方法,我们可以轻松生成和选择组合特征,从而显着提高了模型的拟合精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号