Classification and knowledge discovery in protein databases.

Radivojac P; Chawla NV; Dunker AK; Obradovic Z

首页> 外文期刊>Journal of biomedical informatics. >Classification and knowledge discovery in protein databases.

【24h】

Classification and knowledge discovery in protein databases.

机译：蛋白质数据库中的分类和知识发现。

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the problem of classification in noisy, high-dimensional, and class-imbalanced protein datasets. In order to design a complete classification system, we use a three-stage machine learning framework consisting of a feature selection stage, a method addressing noise and class-imbalance, and a method for combining biologically related tasks through a prior-knowledge based clustering. In the first stage, we employ Fisher's permutation test as a feature selection filter. Comparisons with the alternative criteria show that it may be favorable for typical protein datasets. In the second stage, noise and class imbalance are addressed by using minority class over-sampling, majority class under-sampling, and ensemble learning. The performance of logistic regression models, decision trees, and neural networks is systematically evaluated. The experimental results show that in many cases ensembles of logistic regression classifiers may outperform more expressive models due to their robustness to noise and low sample density in a high-dimensional feature space. However, ensembles of neural networks may be the best solution for large datasets. In the third stage, we use prior knowledge to partition unlabeled data such that the class distributions among non-overlapping clusters significantly differ. In our experiments, training classifiers specialized to the class distributions of each cluster resulted in a further decrease in classification error.

机译：我们考虑在嘈杂的，高维的和类别不平衡的蛋白质数据集中进行分类的问题。为了设计一个完整的分类系统，我们使用一个三阶段的机器学习框架，该框架包括一个特征选择阶段，一个解决噪声和类不平衡的方法以及一种通过基于先验知识的聚类来组合生物学相关任务的方法。在第一阶段，我们将Fisher置换测试用作特征选择过滤器。与替代标准的比较表明，它可能对典型的蛋白质数据集有利。在第二阶段，通过使用少数群体过度采样，多数群体不足采样和集成学习来解决噪声和群体失衡。逻辑评估了回归模型，决策树和神经网络的性能。实验结果表明，在许多情况下，由于逻辑回归分类器在高维特征空间中对噪声的鲁棒性和较低的样本密度，它们的性能可能优于更具表现力的模型。但是，神经网络的集成可能是大型数据集的最佳解决方案。在第三阶段，我们使用先验知识对未标记的数据进行分区，以使非重叠群集之间的类分布显着不同。在我们的实验中，专门针对每个聚类的类别分布的训练分类器导致分类误差进一步降低。

著录项

来源
《Journal of biomedical informatics.》 |2004年第4期|共16页
作者
Radivojac P; Chawla NV; Dunker AK; Obradovic Z;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类基础医学;
关键词
Diagnostic Neoplasm Staging; Noise; Classification; Databases; Protein; 噪声; 分类法;

机译：Diagnostic Neoplasm Staging;Noise;Classification;Databases;Protein;噪声;分类法;

相似文献

外文文献
中文文献
专利

1. Classification and knowledge discovery in protein databases. [J] . Radivojac P, Chawla NV, Dunker AK, Journal of biomedical informatics. . 2004,第4期

机译：蛋白质数据库中的分类和知识发现。
2. Data integration and knowledge discovery in biomedical databases. Reliable information from unreliable sources [J] . A Mitnitski, A Mogilner, C MacKnight, Data science journal . 2003,第2003期

机译：生物医学数据库中的数据集成和知识发现。来自不可靠来源的可靠信息
3. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. [J] . Altschul SF, Koonin EV Trends in biochemical sciences . 1998,第11期

机译：使用PSI-BLAST进行迭代的配置文件搜索-一种在蛋白质数据库中发现的工具。
4. Knowledge discovery with Artificial Immune Systems for hierarchical multi-label classification of protein functions [C] . Alves R. T., Delgado M. R., Freitas A. A. 2010 IEEE International Conference on Fuzzy Systems . 2010

机译：利用人工免疫系统对蛋白质功能进行分级多标签分类的知识发现
5. Classification and knowledge discovery in protein databases. [D] . Radivojac, Predrag. 2004

机译：蛋白质数据库中的分类和知识发现。
6. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases. [O] . Y. S. Tsai, P. H. King, M. S. Higgins, 1997

机译：专家指导的决策树构建策略：医学数据库在知识发现中的应用。
7. Knowledge Discovery with Artificial Immune Systems for Hierarchical Multi-label Classification of Protein Functions [O] . R. T. Alves, M. R. Delgado, A. A. Freitas 2011

机译：用人工免疫系统进行知识发现，用于蛋白质功能的分层多标签分类

Classification and knowledge discovery in protein databases.

摘要

著录项

相似文献

相关主题

期刊订阅