首页> 外文期刊>Knowledge and Information Systems >Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets
【24h】

Query-learning-based iterative feature-subset selection for learning from high-dimensional data sets

机译:从高维数据集学习的基于查询学习的迭代特征子集选择

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a new data-mining method that is effective for learning from extremely high-dimensional data sets. Our proposed method selects a subset of features from a high-dimensional data set by a process of iterative refinement. Our selection of a feature-subset has two steps. The first step selects a subset of instances, to which predictions by hypotheses previously obtained are most unreliable, from the data set. The second step selects a subset of features whose values in the selected instances vary the most from those in all instances of the database. We empirically evaluate the effectiveness of the proposed method by comparing its performance with those of four other methods, including one of the latest feature-subset selection methods. The evaluation was performed on a real-world data set with approximately 140,000 features. Our results show that the performance of the proposed method exceeds those of the other methods in terms of prediction accuracy, precision at a certain recall value, and computation time to reach a certain prediction accuracy. We have also examined the effect of noise in the data and found that the advantage of the proposed method becomes more pronounced for larger noise levels. Extended abstracts of parts of the work presented in this paper have appeared in Mamitsuka [14] and Mamitsuka [15].
机译:我们提出了一种新的数据挖掘方法,该方法可有效地从极高维度的数据集中学习。我们提出的方法通过迭代精炼过程从高维数据集中选择特征子集。我们选择的功能子集有两个步骤。第一步,从数据集中选择实例子集,对于这些子集,先前获得的假设的预测最不可靠。第二步选择要素的子集,其所选实例中的值与数据库所有实例中的值变化最大。我们通过与其他四种方法(包括一种最新的特征子集选择方法)的性能进行比较,从经验上评估该方法的有效性。评估是在具有约140,000个特征的真实数据集上进行的。我们的结果表明,该方法的性能在预测精度,特定召回值下的精度以及达到特定预测精度的计算时间方面超过了其他方法。我们还检查了数据中噪声的影响,发现对于较大的噪声水平,所提出方法的优势更加明显。本文提出的部分工作的扩展摘要出现在Mamitsuka [14]和Mamitsuka [15]中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号