首页> 外文学位 >Sequential pattern classification without explicit feature extraction.
【24h】

Sequential pattern classification without explicit feature extraction.

机译:顺序模式分类,无需显式特征提取。

获取原文
获取原文并翻译 | 示例

摘要

Feature selection, representation and extraction are integral to statistical pattern recognition systems. Usually features are represented as vectors that capture expert knowledge of measurable discriminative properties of the classes to be distinguished. The feature selection process entails manual expert involvement and repeated experiments. Automatic feature selection is necessary when (i) expert knowledge is unavailable, (ii) distinguishing features among classes cannot be quantified, or (iii) when a fixed length feature description cannot faithfully reflect all possible variations of the classes as in the case of sequential patterns (e.g. time series data). Automatic feature selection and extraction are also useful when developing pattern recognition systems that are scalable across new sets of classes. For example, an OCR designed with explicit feature selection process for the alphabet of one language usually does not scale to an alphabet of another language.; One approach to avoiding explicit feature selection is to use a (dis)similarity representation instead of a feature vector representation. The training set is represented by a similarity matrix and new objects are classified based on their similarity with samples in the training set. A suitable similarity measure can also be used to increase the classification efficiency of traditional classifiers such as Support Vector Machines (SVMs).; In this thesis we establish new techniques for sequential pattern recognition without explicit feature extraction for applications where: (i) a robust similarity measure exists to distinguish classes and (ii) the classifier (such as SVM) utilizes a similarity measure for both training and evaluation. We investigate the use of similarity measures for applications such as on-line signature verification and on-line handwriting recognition. Paucity of training samples can render the traditional training methods ineffective as in the case of on-line signatures where the number of training samples is rarely greater than 10. We present a new regression measure (ER 2) that can classify multi-dimensional sequential patterns without the need for training with large number of prototypes. We use ER 2 as a preprocessing filter in cases when sufficient training prototypes are available in order to speedup the SVM evaluation. We demonstrate the efficacy of a two stage recognition system by using Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) in the supervised classification framework of SVM. We present experiments with off-line digit images where the pixels are simply ordered in a predetermined manner to simulate sequential patterns. The Generalized Regression Model (GRM) is described to deal with the unsupervised classification (clustering) of sequential patterns.
机译:特征选择,表示和提取是统计模式识别系统必不可少的。通常将特征表示为向量,以捕获要区分的类的可测量判别属性的专业知识。特征选择过程需要专家的手动参与和重复实验。当(i)无法获得专家知识,(ii)无法量化类别之​​间的区别特征或(iii)当固定长度的特征描述不能如顺序的情况那样忠实反映类别的所有可能变化时,必须进行自动特征选择模式(例如时间序列数据)。在开发可跨新类集进行扩展的模式识别系统时,自动特征选择和提取也很有用。例如,为一种语言的字母表设计了显式特征选择处理的OCR通常不会缩放为另一种语言的字母表。避免显式特征选择的一种方法是使用(不相似)表示代替特征向量表示。训练集由相似性矩阵表示,并且根据新对象与训练集中的样本的相似性对新对象进行分类。也可以使用适当的相似性度量来提高传统分类器(如支持向量机(SVM))的分类效率。在本文中,我们建立了无需以下特征即可提取的连续模式识别新技术:(i)存在鲁棒的相似性度量以区分类别,并且(ii)分类器(例如SVM)将相似性度量用于训练和评估。我们调查了针对在线签名验证和在线手写识别等应用程序使用相似性度量的情况。缺乏训练样本可能会使传统训练方法无效,就像在线签名的情况下那样(训练样本的数量很少大于10)。我们提出了一种新的回归测度(ER 2),可以对多维顺序模式进行分类无需培训大量原型。如果有足够的训练原型可以加快SVM评估,我们将ER 2用作预处理过滤器。我们通过在SVM的监督分类框架中使用主成分分析(PCA)和递归特征消除(RFE)来展示两阶段识别系统的功效。我们介绍了离线数字图像的实验,其中像素以预定的方式简单地排序以模拟顺序模式。描述了广义回归模型(GRM)以处理顺序模式的无监督分类(聚类)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号