首页> 外文会议>Sixteenth Annual Neural Information Processing Systems (NIPS) Conference; Dec 9-14, 2002; British Columbia, Canada >Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers
【24h】

Feature Selection and Classification on Matrix Data: From Large Margins To Small Covering Numbers

机译:矩阵数据的特征选择和分类:从大边距到小覆盖数

获取原文
获取原文并翻译 | 示例

摘要

We investigate the problem of learning a classification task for datasets which are described by matrices. Rows and columns of these matrices correspond to objects, where row and column objects may belong to different sets, and the entries in the matrix express the relationships between them. We interpret the matrix elements as being produced by an unknown kernel which operates on object pairs and we show that - under mild assumptions - these kernels correspond to dot products in some (unknown) feature space. Minimizing a bound for the generalization error of a linear classifier which has been obtained using covering numbers we derive an objective function for model selection according to the principle of structural risk minimization. The new objective function has the advantage that it allows the analysis of matrices which are not positive definite, and not even symmetric or square. We then consider the case that row objects are interpreted as features. We suggest an additional constraint, which imposes sparseness on the row objects and show, that the method can then be used for feature selection. Finally, we apply this method to data obtained from DNA microar-rays, where "column" objects correspond to samples, "row" objects correspond to genes and matrix elements correspond to expression levels. Benchmarks are conducted using standard one-gene classification and support vector machines and K-nearest neighbors after standard feature selection. Our new method extracts a sparse set of genes and provides superior classification results.
机译:我们研究了为矩阵描述的数据集学习分类任务的问题。这些矩阵的行和列对应于对象,其中行和列对象可能属于不同的集合,矩阵中的条目表示它们之间的关系。我们将矩阵元素解释为由操作对象对的未知内核生成,并且我们证明了-在温和的假设下-这些内核对应于某些(未知)特征空间中的点积。最小化使用覆盖数获得的线性分类器的泛化误差的界限,我们根据结构风险最小化的原理推导了模型选择的目标函数。新的目标函数的优势在于,它可以分析不是正定的矩阵,甚至不能对称或平方的矩阵。然后,我们考虑将行对象解释为要素的情况。我们建议了一个附加约束,该约束将稀疏性强加到行对象上并显示出该方法可用于特征选择。最后,我们将此方法应用于从DNA微型射线获得的数据,其中“列”对象对应于样本,“行”对象对应于基因,而矩阵元素对应于表达水平。使用标准的单基因分类和支持向量机以及标准特征选择后的K近邻来进行基准测试。我们的新方法提取了一组稀疏的基因,并提供了出色的分类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号