【24h】

Classifying Protein Fingerprints

机译:分类蛋白质指纹

获取原文

摘要

Protein fingerprints are groups of conserved motifs which can be used as diagnostic signatures to identify and characterize collections of protein sequences. These fingerprints are stored in the prints database after time-consuming annotation by domain experts who must first of all determine the fingerprint type, i.e., whether a fingerprint depicts a protein family, superfamily or domain. To alleviate the annotation bottleneck, a system called PRECIS has been developed which automatically generates prints records, provisionally stored in a supplement called preprints. One limitation of PRECIS is that its classification heuristics, handcoded by proteomics experts, often misclassify fingerprint type; their error rate has been estimated at 40%. This paper reports on an attempt to build more accurate classifiers based on information drawn from the fingerprints themselves and from the SWISS-PROT database. Extensive experimentation using 10-fold cross-validation led to the selection of a model combining the ReliefF feature selector with an SVM-RBF learner. The final model's error rate was estimated at 14.1% on a blind test set, representing a 26% accuracy gain over PRECIS' handcrafted rules.
机译:蛋白质指纹是保守基序的组,可用作诊断签名以识别和表征蛋白质序列的集合。这些指纹在待耗时的域专家耗时的注释之后存储在打印数据库中,该域专家必须先确定指纹类型,即指纹是否描绘了蛋白质家族,超家族或域。为了缓解注释瓶颈,已经开发出一种称为PRECI的系统,它自动生成打印记录,暂时存储在称为预印迹的补充剂中。对PRECI的一个限制是其分类启发式,由蛋白质组学专家手动铺设,通常错误分类指纹类型;他们的错误率估计为40%。本文报告了尝试根据从指纹本身和瑞士 - Prot数据库中汲取的信息构建更准确的分类器。使用10倍交叉验证的广泛实验导致选择与SVM-RBF学习者将Crefieff功能选择器组合的模型。盲试验集的最终模型的错误率估计为14.1%,代表Precis手工规则的26%的准确性增益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号