首页> 外文期刊>Journal of computer sciences >Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes | Science Publications
【24h】

Optimizing Feature Construction Process for Dynamic Aggregation of Relational Attributes | Science Publications

机译:关系属性动态聚合的特征构建过程优化科学出版物

获取原文
           

摘要

> Problem statement: The importance of input representation has been recognized already in machine learning. Feature construction is one of the methods used to generate relevant features for learning data. This study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process. In other words, this paper discusses the application of genetic algorithm to optimize the feature construction process to generate input data for the data summarization method called Dynamic Aggregation of Relational Attributes (DARA). Approach: The DARA algorithm was designed to summarize data stored in the non-target tables by clustering them into groups, where multiple records stored in non-target tables correspond to a single record stored in a target table. Here, feature construction methods are applied in order to improve the descriptive accuracy of the DARA algorithm. Since, the study addressed the question whether or not the descriptive accuracy of the DARA algorithm benefits from the feature construction process, the involved task includes solving the problem of constructing a relevant set of features for the DARA algorithm by using a genetic-based algorithm. Results: It is shown in the experimental results that the quality of summarized data is directly influenced by the methods used to create patterns that represent records in the (n×p) TF-IDF weighted frequency matrix. The results of the evaluation of the genetic-based feature construction algorithm showed that the data summarization results can be improved by constructing features by using the Cluster Entropy (CE) genetic-based feature construction algorithm. Conclusion: This study showed that the data summarization results can be improved by constructing features by using the cluster entropy genetic-based feature construction algorithm.
机译: > 问题陈述:在机器学习中,输入表示的重要性已经得到认可。特征构造是用于生成学习数据的相关特征的方法之一。这项研究解决了DARA算法的描述准确性是否受益于特征构建过程的问题。换句话说,本文讨论了遗传算法在优化特征构造过程以生成输入数据的过程中的应用,该输入数据用于数据汇总方法,称为关系属性动态汇总(DARA)。 方法: DARA算法旨在通过将非目标表中的数据聚类成组来汇总存储在非目标表中的数据,其中,非目标表中存储的多个记录对应于目标表中存储的单个记录。这里,为了提高DARA算法的描述精度,应用了特征构造方法。由于该研究解决了DARA算法的描述准确性是否受益于特征构建过程的问题,因此所涉及的任务包括解决通过使用基于遗传的算法为DARA算法构建一组相关特征的问题。 结果:实验结果表明,汇总数据的质量直接受到用于创建表示(n×p)TF-IDF加权频率矩阵中的记录的模式的方法的影响。对基于遗传的特征构建算法的评估结果表明,通过使用基于聚类熵(CE)的基于遗传的特征构建算法来构建特征,可以改善数据汇总结果。 结论:该研究表明,使用基于聚类熵遗传的特征构建算法构建特征可以改善数据汇总结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号