首页> 外文会议> >Why are neural networks sometimes much more accurate than decision trees: an analysis on a bio-informatics problem
【24h】

Why are neural networks sometimes much more accurate than decision trees: an analysis on a bio-informatics problem

机译:为什么神经网络有时比决策树更准确:对生物信息学问题的分析

获取原文

摘要

Bio-informatics data sets may be large in the number of examples and/or the number of features. Predicting the secondary structure of proteins from amino acid sequences is one example of high dimensional data for which large training sets exist. The data from the KDD Cup 2001 on the binding of compounds to thrombin is another example of a very high dimensional data set. This type of data set can require significant computing resources to train a neural network. In general, decision trees will require much less training time than neural networks. There have been a number of studies on the advantages of decision trees relative to neural networks for specific data sets. There are often statistically significant, though typically not very large, differences. Here, we examine one case in which a neural network greatly outperforms a decision tree; predicting the secondary structure of proteins. The hypothesis that the neural network learns important features of the data through its hidden units is explored by a using a neural network to transform data for decision tree training. Experiments show that this explains some of the performance difference, but not all. Ensembles of decision trees are compared with a single neural network. It is our conclusion that the problem of protein secondary structure prediction exhibits some characteristics that are fundamentally better exploited by a neural network model.
机译:生物信息学数据集的示例数量和/或特征数量可能很大。从氨基酸序列预测蛋白质的二级结构是存在大量训练集的高维数据的一个示例。来自KDD Cup 2001的有关化合物与凝血酶结合的数据是一个非常高维数据集的另一个示例。这种类型的数据集可能需要大量的计算资源来训练神经网络。通常,决策树比神经网络需要更少的训练时间。关于特定数据集,决策树相对于神经网络的优势已有许多研究。尽管通常差异不大,但通常在统计上存在显着差异。在这里,我们研究一种情况,其中神经网络的性能大大优于决策树。预测蛋白质的二级结构。通过使用神经网络转换数据以进行决策树训练,探索了神经网络通过其隐藏单元学习数据的重要特征的假设。实验表明,这可以解释某些性能差异,但不是全部。将决策树集合与单个神经网络进行比较。我们的结论是,蛋白质二级结构预测问题表现出一些特征,这些特征在根本上可以被神经网络模型更好地利用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号