...
首页> 外文期刊>Computers in Biology and Medicine >Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data.
【24h】

Forest classification trees and forest support vector machines algorithms: Demonstration using microarray data.

机译:森林分类树和森林支持向量机算法:使用微阵列数据进行演示。

获取原文
获取原文并翻译 | 示例
           

摘要

Classification into multiple classes when the measured variables are outnumbered is a major methodological challenge in -omics studies. Two algorithms that overcome the dimensionality problem are presented: the forest classification tree (FCT) and the forest support vector machines (FSVM). In FCT, a set of variables is randomly chosen and a classification tree (CT) is grown using a forward classification algorithm. The process is repeated and a forest of CTs is derived. Finally, the most frequent variables from the trees with the smallest apparent misclassification rate (AMR) are used to construct a productive tree. In FSVM, the CTs are replaced by SVMs. The methods are demonstrated using prostate gene expression data for classifying tissue samples into four tumor types. For threshold split value 0.001 and utilizing 100 markers the productive CT consisted of 29 terminal nodes and achieved perfect classification (AMR=0). When the threshold value was set to 0.01, a tree with 17 terminal nodes was constructed based on 15 markers (AMR=7%). In FSVM, reducing the fraction of the forest that was used to construct the best classifier from the top 80% to the top 20% reduced the misclassification to 25% (when using 200 markers). The proposed methodologies may be used for identifying important variables in high dimensional data. Furthermore, the FCT allows exploring the data structure and provides a decision rule.
机译:当组学研究中,当测量变量超过时,将其分为多个类别是一个主要的方法论挑战。提出了两种解决维数问题的算法:森林分类树(FCT)和森林支持向量机(FSVM)。在FCT中,随机选择一组变量,并使用前向分类算法来生长分类树(CT)。重复该过程,并派生出一系列CT。最后,使用具有最小表观错误分类率(AMR)的树木中最频繁出现的变量来构造生产树。在FSVM中,CT被SVM取代。使用前列腺基因表达数据证明了该方法可将组织样品分为四种肿瘤类型。对于阈值拆分值0.001,并使用100个标记,生产性CT由29个终端节点组成,并实现了完美分类(AMR = 0)。当阈值设置为0.01时,基于15个标记(AMR = 7%)构建具有17个终端节点的树。在FSVM中,将用于构建最佳分类器的森林比例从最高的80%减少到最高的20%,可以将错误分类减少到25%(使用200个标记时)。所提出的方法可以用于识别高维数据中的重要变量。此外,FCT允许探索数据结构并提供决策规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号