首页> 外文会议>Annual neural information processing systems conference >Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers
【24h】

Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers

机译:矩阵数据的特征选择和分类:从大边缘到小覆盖号码

获取原文

摘要

We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microar-rays, where "column" objects correspond to samples, "row" objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classification and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classification results.
机译:我们研究了学习矩阵描述的数据集的分类任务的问题。这些矩阵的行和列对应于对象,其中行和列对象可以属于不同的集合,并且矩阵中的条目表达了它们之间的关系。我们将矩阵元素解释为由Object对操作的未知内核产生,并且我们在温和的假设下显示 - 这些内核对应于某些(未知)特征空间中的点产品。最小化使用覆盖数获得的线性分类器的泛化误差的界限我们导出了根据结构风险最小化原理的模型选择的目标函数。新的客观函数具有以下优点:它允许分析不是正定的矩阵,甚至是对称或正方形的矩阵。然后,我们考虑将行对象被解释为特征的情况。我们建议一个额外的约束,它对行对象并显示稀疏,然后可以用于特征选择。最后,我们将该方法应用于从DNA微型光线获得的数据,其中“列”对象对应于样本,“行”对象对应于基因和矩阵元素对应于表达水平。在标准特征选择后,使用标准的单基因分类和支持向量机和k最近邻居进行基准。我们的新方法提取了一组稀疏的基因,并提供了卓越的分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号