【24h】

Feature Selection-Ranking Methods in a Very Large Electric Database

机译:大型电气数据库中的特征选择排序方法

获取原文
获取原文并翻译 | 示例

摘要

Feature selection is a crucial activity when knowledge discovery is applied to very large databases, as it reduces dimensionality and therefore the complexity of the problem. Its main objective is to eliminate attributes to obtain a computationally tractable problem, without affecting the quality of the solution. To perform feature selection, several methods have been proposed, some of them tested over small academic datasets. In this paper we evaluate different feature selection-ranking methods over a very large real world database related with a Mexican electric energy client-invoice system. Most of the research on feature selection methods only evaluates accuracy and processing time; here we also report on the amount of discovered knowledge and stress the issue around the boundary that separates relevant and irrelevant features. The evaluation was done using Elvira and Weka tools, which integrate and implement state of the art data mining algorithms. Finally, we propose a promising feature selection heuristic based on the experiments performed.
机译:当知识发现应用于超大型数据库时,特征选择是一项至关重要的活动,因为它降低了维数,从而降低了问题的复杂性。其主要目的是消除属性以获得可计算的易处理问题,而不影响解决方案的质量。为了进行特征选择,已经提出了几种方法,其中一些方法是在小型学术数据集上进行测试的。在本文中,我们在与墨西哥电力客户-发票系统相关的大型真实数据库中评估了不同的特征选择排序方法。大多数关于特征选择方法的研究都只评估准确性和处理时间。在这里,我们还报告发现的知识的数量,并强调将相关特征和无关特征分开的边界问题。使用Elvira和Weka工具进行了评估,这些工具集成并实现了最新的数据挖掘算法。最后,我们基于所进行的实验提出了一种很有前途的特征选择启发式方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号