A clustering-based feature selection method for automatically generated relational attributes

Rezaei Mostafa; Cribben Ivor; Samorani Michele

首页> 外文期刊>Annals of Operations Research >A clustering-based feature selection method for automatically generated relational attributes

【24h】

A clustering-based feature selection method for automatically generated relational attributes

机译：基于群集的特征选择方法，用于自动生成关系属性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although data mining problems require a flat mining table as input, in many real-world applications analysts are interested in finding patterns in a relational database. To this end, new methods and software have been recently developed that automatically add attributes (or features) to a target table of a relational database which summarize information from all other tables. When attributes are automatically constructed by these methods, selecting the important attributes is particularly difficult, because a large number of the attributes are highly correlated. In this setting, attribute selection techniques such as the Least Absolute Shrinkage and Selection Operator (lasso), elastic net, and other machine learning methods tend to under-perform. In this paper, we introduce a novel attribute selection procedure, where after an initial screening step, we cluster the attributes into different groups and apply the group lasso to select both the true attributes groups and then the true attributes. The procedure is particularly suited to high dimensional data sets where the attributes are highly correlated. We test our procedure on several simulated data sets and a real-world data set from a marketing database. The results show that our proposed procedure obtains a higher predictive performance while selecting a much smaller set of attributes when compared to other state-of-the-art methods.

机译：虽然数据挖掘问题需要一个平面挖掘表作为输入，但在许多真实的应用程序中，分析师都有兴趣在关系数据库中找到模式。为此，最近已经开发出新的方法和软件，它将自动将属性（或功能）添加到关系数据库的目标表，该数据库总结了所有其他表的信息。当由这些方法自动构建属性时，选择重要属性是特别困难的，因为大量属性是高度相关的。在该设置中，属性选择技术，例如绝对收缩和选择操作员（套索），弹性网和其他机器学习方法倾向于不足。在本文中，我们介绍了一种新颖的属性选择过程，其中在初始筛选步骤之后，将属性群集到不同的组中，并应用组套索选择真实属性组，然后应用于真实属性。该过程特别适用于该属性高度相关的高维数据集。我们在几个模拟数据集和从营销数据库设置的真实数据集的过程测试。结果表明，与其他最先进的方法相比，我们所提出的程序获得更高的预测性能，同时选择更小的一组属性。

著录项

来源
《Annals of Operations Research》 |2021年第2期|233-263|共31页
作者
Rezaei Mostafa; Cribben Ivor; Samorani Michele;
展开▼
作者单位

Univ Alberta Alberta Sch Business Operat & Informat Syst Edmonton AB T6G 2R6 Canada;

Univ Alberta Alberta Sch Business Finance & Stat Anal Edmonton AB T6G 2R6 Canada;

Santa Clara Univ Leavey Sch Business Informat Syst & Analyt Santa Clara CA 95053 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Relational attribute generation; Feature selection; Lasso; Elastic net; Clustering;

机译：关系属性生成;特征选择;套索;弹性网;聚类;

相似文献

外文文献
中文文献
专利

1. Methods of forward feature selection based on the aggregation of classifiers generated by single attribute. [J] . Luo L, Ye L, Luo M, Computers in Biology and Medicine . 2011,第7期

机译：基于单个属性生成的分类器聚合的前向特征选择方法。
2. Mining High Dimensional Data Using Attribute Clustering-Based Feature Subset Selection Algorithm [J] . Vivek Ravindra Prasad Pandey, T.Venu, N.Subhash Chandra International Journal of Computer Trends and Technology . 2014,第2期

机译：使用基于属性聚类的特征子集选择算法挖掘高维数据
3. PSO with surrogate models for feature selection: static and dynamic clustering-based methods [J] . Hoai Bach Nguyen, Xue Bing, Andreae Peter Memetic computing . 2018,第3期

机译：PSO带有代理模型的特征选择：基于静态和动态聚类的方法
4. Clustering-Based Joint Feature Selection for Semantic Attribute Prediction [C] . Lin Chen, Baoxin Li International Joint Conference on Artificial Intelligence . 2016

机译：基于聚类的语义属性预测的联合特征选择
5. Image content matching and retrieval using attributed feature-relational graph and perceptual organizations. [D] . Li, Wenyi. 2002

机译：使用归因特征关系图和感知组织进行图像内容匹配和检索。
6. A novel selection method of seismic attributes based on gray relational degree and support vector machine [O] . Yaping Huang, Haijun Yang, Xuemei Qi, 2012

机译：基于灰色关联度和支持向量机的地震属性选择方法
7. Panel of Attribute Selection Methods to Rank Features Drastically Improves Accuracy in Filtering Web-pages Suitable for Education [O] . Vladimir Estivill-Castro, Matteo Lombardi, Alessandro Marani 2019

机译：属性选择方法的小组在排名功能大大提高了过滤适合教育的网页的准确性

A clustering-based feature selection method for automatically generated relational attributes

摘要

著录项

相似文献

相关主题

期刊订阅