首页> 外文学位 >Dimension reduction algorithms in data mining, with applications.
【24h】

Dimension reduction algorithms in data mining, with applications.

机译:数据挖掘中的降维算法及其应用。

获取原文
获取原文并翻译 | 示例

摘要

This thesis concentrates on the theory, implementation, and application of dimension reduction in data mining. Many real-world applications such as text mining, image retrieval, face recognition, and microarray data analysis involve high-dimensional data, where the dimension can often run into the thousands. Traditional machine learning and data mining techniques are not effective when dealing with such high-dimensional data because of the so-called curse of dimensionality. A natural approach to deal with this problem is to apply dimension reduction as a pre-processing step.; The first part of the thesis presents a dimension reduction technique for data in matrix form. The essence of the proposed algorithm is that it applies a bilinear transformation on the data. Such a bilinear transformation is particularly appropriate for data in matrix representation and often leads to lower computational costs compared to traditional algorithms. A natural application of the algorithm is in image compression and retrieval, where each image is represented in its native matrix representation. Extensive experiments performed using image data show that the proposed algorithm outperforms the traditional ones, in terms of computational time and space requirement, while maintaining competitive performance in classification.; The second part of the thesis focuses on generalizing classical Linear Discriminant Analysis (LDA) to overcome problems associated with undersampled data, where the data dimension is much greater than the number of data items. The optimization criterion in classical LDA fails when the scatter matrices are singular, which is the case for undersampled problems. A new optimization criterion has been developed which is applicable to undersampled problems. The algorithms based on the proposed criterion have been shown to be very competitive in classification.; The final part of the thesis considers the problem of designing an efficient and incremental dimension reduction algorithm. An LDA-based incremental dimension reduction algorithm has been developed. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large datasets. More importantly, with the insertion of new data items, the proposed algorithm can constrain the computational cost by efficient incremental updating techniques. Experiments reveal that the proposed algorithm is competitive in classification, but has much lower computational cost, especially when new data items are inserted dynamically.
机译:本文着重研究数据挖掘中降维的理论,实现及应用。许多现实世界中的应用程序,例如文本挖掘,图像检索,人脸识别和微阵列数据分析,都涉及到高维数据,而高维数据通常可以达到数千个。传统的机器学习和数据挖掘技术在处理此类高维数据时效率不高,原因是所谓的维数诅咒。解决此问题的自然方法是将尺寸缩小作为预处理步骤。本文的第一部分提出了一种矩阵形式的数据降维技术。该算法的实质是对数据进行双线性变换。这样的双线性变换特别适合于矩阵表示中的数据,并且与传统算法相比通常导致较低的计算成本。该算法的自然应用是图像压缩和检索,其中每个图像均以其本机矩阵表示形式表示。利用图像数据进行的大量实验表明,该算法在计算时间和空间要求上均优于传统算法,同时在分类方面保持竞争优势。本文的第二部分着重于对经典线性判别分析(LDA)进行归纳,以克服与欠采样数据相关的问题,因为数据维度远大于数据项的数量。当散点矩阵为奇数时,经典LDA中的优化准则将失败,这是欠采样问题的情况。已经开发了适用于欠采样问题的新优化准则。已经证明,基于所提出标准的算法在分类方面非常有竞争力。本文的最后一部分考虑了设计有效的增量降维算法的问题。已经开发了基于LDA的增量降维算法。与其他基于LDA的算法不同,该算法不需要主存储器中的整个数据矩阵。这对于大型数据集是理想的。更重要的是,随着新数据项的插入,所提出的算法可以通过有效的增量更新技术来限制计算成本。实验表明,该算法在分类上具有竞争优势,但计算成本低得多,尤其是在动态插入新数据项时。

著录项

  • 作者

    Ye, Jieping.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 91 p.
  • 总页数 91
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号