【24h】

Classifying Protein Fingerprints

机译:分类蛋白质指纹

获取原文
获取原文并翻译 | 示例

摘要

Protein fingerprints are groups of conserved motifs which can be used as diagnostic signatures to identify and characterize collections of protein sequences. These fingerprints are stored in the PRINTS database after time-consuming annotation by domain experts who must first of all determine the fingerprint type, i.e., whether a fingerprint depicts a protein family, superfamily or domain. To alleviate the annotation bottleneck, a system called PRECIS has been developed which automatically generates PRINTS records, provisionally stored in a supplement called prePRINTS. One limitation of PRECIS is that its classification heuristics, handcoded by proteomics experts, often misclassify fingerprint type; their error rate has been estimated at 40%. This paper reports on an attempt to build more accurate classifiers based on information drawn from the fingerprints themselves and from the SWISS-PROT database. Extensive experimentation using 10-fold cross-validation led to the selection of a model combining the ReliefF feature selector with an SVM-RBF learner. The final model's error rate was estimated at 14.1% on a blind test set, representing a 26% accuracy gain over PRECIS' handcrafted rules.
机译:蛋白质指纹是一组保守的基序,可用作诊断签名,以鉴定和表征蛋白质序列的集合。这些指纹由领域专家在费时的注释之后存储在PRINTS数据库中,领域专家必须首先确定指纹类型,即,指纹是描述蛋白质家族,超家族还是领域。为了缓解注释瓶颈,已经开发了一种称为PRECIS的系统,该系统可以自动生成PRINTS记录,并临时存储在称为prePRINTS的补充文件中。 PRECIS的局限性之一是它的分类启发式方法(由蛋白质组学专家手工编码)经常错误地对指纹类型进行分类。他们的错误率估计为40%。本文报告了尝试根据指纹本身和SWISS-PROT数据库中提取的信息构建更准确的分类器的尝试。使用10倍交叉验证的广泛实验导致选择了将ReliefF功能选择器与SVM-RBF学习器结合在一起的模型。在盲测装置上,最终模型的错误率估计为14.1%,比PRECIS的手工规则提高了26%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号