公共数据库中蛋白编码基因错误注释问题影响了数据库的使用质量.通过综合运用多种计算方法对在生物能源及环境治理中具有重要应用的Shewanella oneidensis MR -1菌基因组蛋白编码基因进行了重注释,结果得到2个过注释基因,并发现了289个新基因.基于功能已知的蛋白质编码基因表明所采用的过注释算法得到的Ac 、MCC、AUC分别为99.90%、0.9982和0.9999.基于BLAST 和COG对预测得到的289个欠注释基因进行功能预测表明有152个欠注释蛋白质编码基因得到明确的生物学功能,为今后的深入研究提供了数据支持.%The problem of protein coding gene error annotation in public database affects the quality of the database .As an excellent exoelectrogen in Microbial fuel cell ,Shewanella oneidensis MR -1 was signifi‐cant in exoelectrogen genomics research .In this study ,the over annotated protein -coding genes and the missing genes in S .oneidensis MR-1 genome were predicted based on a hybrid method .As a result ,two over-annotated protein coding genes was identified and 289 missing protein coding genes were predicted . The over annotated program Accuracy ,MCC ,AUC were 99 .90% ,0 .9982 and 0 .9999 respectively based on the function known protein -coding genes .In addition ,the functions of 289 missing genes were predic‐ted by BLAST and COG program show that there are 152 missing genes that have a clear biological func‐tion .
展开▼