首页> 外文学位 >A clustering and principal component approach to exemplar based machine learning for classification identification.
【24h】

A clustering and principal component approach to exemplar based machine learning for classification identification.

机译:一种基于样本的机器学习的聚类和主成分方法,用于分类识别。

获取原文
获取原文并翻译 | 示例

摘要

Classifying detections is an important field of study in many disciplines. Typically, data can be represented in the form of a multidimensional vector defined within some hyperspace (e.g. One may have the sepal length, sepal width, petal length and petal width of an iris flower). One can view many classification problems as processing an unknown data vector in some way that produces an output which correctly categorizes it (e.g. Is the iris flower Iris Setosa, Iris Versicolour or Iris Virginica)? Exemplar based machine learning techniques tackle these problems by learning from representative training data. Several popular algorithms employing these techniques in various ways have been developed and published in the literature. This study explores and develops an innovative exemplar based machine learning methodology which combines clustering techniques with the tools of principal components analysis (PCA) to tackle this problem. Through clustering the methodology segments each classification's arbitrary multidimensional complex shape of training data in a way which can be adequately generalized using the tools of PCA. This generalization is then applied toward the development of an exemplar based machine learning algorithm capable of classifying unknown data. The methodology was applied to twenty one real world data sets obtained from the University of California at Irvine data repository and the results were compared to those of other research methods. The overall accuracy results equaled or exceeded the absolute best of any other method found by the author for twelve out of the twenty one data sets tested.; The development of a measure of confidence for each classification declared for any given unknown is discussed. Concepts are then proposed which would allow one to decrease the amount of information presented to a user based on the confidence level that the classification was made correctly. This confidence based filtering offers the potential of further increasing the overall accuracy of the algorithm. To highlight, the results suggest that the proposed methodology has a high degree of real world applicability and could be used over a wide range of application domains yielding highly competitive accuracies.
机译:对检测进行分类是许多学科的重要研究领域。通常,数据可以以在某些超空间内定义的多维矢量的形式表示(例如,可以具有鸢尾花的萼片长度,萼片宽度,花瓣长度和花瓣宽度)。在以某种方式处理未知数据向量并产生正确分类的输出时,可以看到许多分类问题(例如鸢尾花鸢尾花Setosa,鸢尾花Versicolour还是鸢尾花Virginica)?基于示例的机器学习技术通过从代表性训练数据中学习来解决这些问题。在文献中已经开发出了几种以各种方式采用这些技术的流行算法。这项研究探索并开发了一种创新的基于示例的机器学习方法,该方法将聚类技术与主成分分析(PCA)工具相结合来解决此问题。通过对方法分类进行聚类,可以使用PCA的工具对每个分类的训练数据的任意多维复杂形状进行适当地概括。然后,将这种概括应用于能够对未知数据进行分类的基于示例的机器学习算法。将该方法应用于从加州大学尔湾分校数据存储库获得的21个现实世界数据集,并将结果与​​其他研究方法进行了比较。总体准确性结果等于或超过作者对21个测试数据集中的12个方法所发现的任何其他方法的绝对最佳结果。讨论了针对任何给定未知数声明的每个分类的置信度度量的发展。然后提出了一些概念,这些概念将允许人们根据正确进行分类的置信度来减少提供给用户的信息量。这种基于置信度的过滤提供了进一步提高算法整体精度的潜力。值得一提的是,结果表明,所提出的方法在现实世界中具有高度适用性,可以在产生高度竞争准确性的广泛应用领域中使用。

著录项

  • 作者

    Cassella, Vincent A.;

  • 作者单位

    The Catholic University of America.;

  • 授予单位 The Catholic University of America.;
  • 学科 Engineering Electronics and Electrical.; Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 125 p.
  • 总页数 125
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;人工智能理论;
  • 关键词

  • 入库时间 2022-08-17 11:39:31

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号