首页> 外文学位 >Evolutionary optimization and ensemble techniques for data mining and pattern recognition.
【24h】

Evolutionary optimization and ensemble techniques for data mining and pattern recognition.

机译:用于数据挖掘和模式识别的进化优化和集成技术。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation addresses fundamental data mining and pattern recognition problems---feature extraction, modeling, and data clustering---through evolutionary computation and ensemble-based approaches.; We offer feature extraction methods for improved pattern classification using genetic algorithms. New features are synthesized by merging the values of original variables during the search process. The genetic search of (sub-) optimal combinations of values is performed using a graph-based encoding of candidate solutions. A compact solution representation with minimal redundancy is used for a wide class of grouping problems, including clustering of variable values. Genetic value clustering is applied to text categorization, DNA-based assignments of individuals in population genetics and parametric learning of Bayesian network classifiers. It is shown that such feature extraction results in better predictive accuracy of classification decisions.; We develop genetic programming algorithms for modeling input-output mappings of continuous variables that incorporates dynamical fitting of free parameters of evolved models. Traditional genetic programming is extended by gradient descent optimization of leaf coefficients of tree-like programs during the evolutionary search that is made possible using algorithmic differentiation. Experimental results show significant improvement in both computational requirements and modeling accuracy for a set of symbolic regression problems.; Ensembles of partitions of data sets are studied in two respects: combination of multiple clusterings and generation of clusterings for an ensemble. We develop two efficient consensus functions for finding a combined partition of good quality. The first consensus function uses an information-theoretic principle based on maximal generalized mutual information. The second function finds a consensus clustering by estimating a probabilistic mixture model from the observed ensemble. It is demonstrated that the ensemble's partitions can be generated by weak clustering algorithms, in particular, by clustering in random low-dimensional subspaces of the original feature space. Experiments indicate that ensemble of an weak partitions can be more accurate than a single sophisticated clustering algorithm. Finally, we consider how the partition generation process can be made adaptable to provide better decisions for the patterns located near the inter-cluster boundaries.
机译:本文通过进化计算和基于集成的方法解决了基本的数据挖掘和模式识别问题-特征提取,建模和数据聚类。我们提供特征提取方法,以使用遗传算法改善模式分类。通过在搜索过程中合并原始变量的值来合成新功能。使用候选解决方案的基于图的编码对值的(子)最佳组合进行遗传搜索。具有最小冗余的紧凑型解决方案表示法可用于各种分组问题,包括变量值的聚类。遗传价值聚类应用于文本分类,群体遗传学中基于DNA的个体分配以及贝叶斯网络分类器的参数学习。结果表明,这种特征提取可以提高分类决策的预测精度。我们开发了用于对连续变量的输入-输出映射进行建模的遗传编程算法,该算法结合了演化模型自由参数的动态拟合。传统的遗传程序设计是通过在进化搜索过程中对树状程序的叶系数进行梯度下降优化来扩展的,这可以通过算法区分来实现。实验结果表明,对于一组符号回归问题,计算要求和建模精度都得到了显着改善。从两个方面研究数据集的分区集合:多个聚类的组合和集合的聚类的生成。我们开发了两个有效的共识功能,以找到高质量的组合分区。第一共识函数使用基于最大广义互信息的信息理论原理。第二个函数通过从观察到的集合估计概率混合模型找到共识聚类。证明了可以通过弱聚类算法,特别是通过在原始特征空间的随机低维子空间中聚类来生成集合的分区。实验表明,弱分区的集成比单个复杂的聚类算法更准确。最后,我们考虑如何使分区生成过程适应性强,以便为位于集群间边界附近的模式提供更好的决策。

著录项

  • 作者

    Topchy, Alexander P.;

  • 作者单位

    Michigan State University.;

  • 授予单位 Michigan State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 172 p.
  • 总页数 172
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号