【2h】

Protein classification artificial neural system.

机译:蛋白质分类人工神经系统。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A neural network classification method is developed as an alternative approach to the large database search/organization problem. The system, termed Protein Classification Artificial Neural System (ProCANS), has been implemented on a Cray supercomputer for rapid superfamily classification of unknown proteins based on the information content of the neural interconnections. The system employs an n-gram hashing function that is similar to the k-tuple method for sequence encoding. A collection of modular back-propagation networks is used to store the large amount of sequence patterns. The system has been trained and tested with the first 2,148 of the 8,309 entries of the annotated Protein Identification Resource protein sequence database (release 29). The entries included the electron transfer proteins and the six enzyme groups (oxidoreductases, transferases, hydrolases, lyases, isomerases, and ligases), with a total of 620 superfamilies. After a total training time of seven Cray central processing unit (CPU) hours, the system has reached a predictive accuracy of 90%. The classification is fast (i.e., 0.1 Cray CPU second per sequence), as it only involves a forward-feeding through the networks. The classification time on a full-scale system embedded with all known superfamilies is estimated to be within 1 CPU second. Although the training time will grow linearly with the number of entries, the classification time is expected to remain low even if there is a 10-100-fold increase of sequence entries. The neural database, which consists of a set of weight matrices of the networks, together with the ProCANS software, can be ported to other computers and made available to the genome community. The rapid and accurate superfamily classification would be valuable to the organization of protein sequence databases and to the gene recognition in large sequencing projects.
机译:开发了神经网络分类方法作为解决大型数据库搜索/组织问题的替代方法。该系统称为蛋白质分类人工神经系统(ProCANS),已在Cray超级计算机上实现,用于基于神经互连的信息内容对未知蛋白质进行快速超家族分类。该系统采用了一个n-gram哈希函数,该函数类似于用于序列编码的k元组方法。模块化反向传播网络的集合用于存储大量的序列模式。已使用注释的蛋白质鉴定资源蛋白质序列数据库(版本29)的8,309个条目中的前2148个对系统进行了培训和测试。条目包括电子转移蛋白和六个酶组(氧化还原酶,转移酶,水解酶,裂解酶,异构酶和连接酶),共有620个超家族。经过总共七个Cray中央处理器(CPU)小时的培训,该系统的预测精度达到了90%。分类是快速的(即每个序列0.1 Cray CPU秒),因为它仅涉及通过网络的前馈。嵌入了所有已知超家族的全面系统的分类时间估计在1 CPU秒之内。尽管训练时间将随着条目数量的增加而线性增长,但是即使序列条目增加了10-100倍,分类时间也有望保持较低水平。由一组网络权重矩阵以及ProCANS软件组成的神经数据库可以移植到其他计算机上,并提供给基因组社区使用。快速和准确的超家族分类对于蛋白质序列数据库的组织和大型测序项目中的基因识别将是有价值的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号