首页> 外文期刊>GigaScience >CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems
【24h】

CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems

机译:CRISPRCASIDIDIER:用于准确识别和CRISPR-CAS系统的机器学习

获取原文
       

摘要

Background: CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for genome engineering in eukaryotic models. Results: We introduce CRISPRcasIdentifier, a new machine learning–based tool that combines regression and classification models for the prediction of potentially missing proteins in instances of CRISPR-Cas systems and the prediction of their respective subtypes. In contrast to other available tools, CRISPRcasIdentifier can both detect cas genes and extract potential association rules that reveal functional modules for CRISPR-Cas systems. In our experimental benchmark on the most recently published and comprehensive CRISPR-Cas system dataset, CRISPRcasIdentifier was compared with recent and state-of-the-art tools. According to the experimental results, CRISPRcasIdentifier presented the best Cas protein identification and subtype classification performance. Conclusions: Overall, our tool greatly extends the classification of CRISPR cassettes and, for the first time, predicts missing Cas proteins and association rules between Cas proteins. Additionally, we investigated the properties of CRISPR subtypes. The proposed tool relies not only on the knowledge of manual CRISPR annotation but also on models trained using machine learning.
机译:背景:与其他原核基因相比,CRISPR-CAS基因非常多样化并迅速发展。随着新测序的古和细菌基因组的快速增加,手动识别CRISPR-CAS系统不再可行。因此,需要自动方法来推进我们对这些系统的演化和多样性的理解,并为真核模型寻找基因组工程的新候选者。结果:我们介绍了Casuprcasidentifier,这是一种基于机器学习的工具,将回归和分类模型结合在CRAP-CAS系统的情况下潜在缺失的蛋白质预测和它们各自的亚型的预测。与其他可用工具相比,CrisPraSidentier可以检测CAS基因,并提取潜在的关联规则,该规则揭示CRISP-CAS系统的功能模块。在我们在最近发布和全面的CRISPR-CAS系统数据集上的实验基准中,与最近和最先进的工具进行了比较CrisPraSidentifier。根据实验结果,CrisPrcasidentier介绍了最佳的CAS蛋白质识别和亚型分类性能。结论:总体而言,我们的工具极大地扩展了CRISPR盒的分类,并首次预测CAS蛋白之间的缺失的CA蛋白和关联规则。此外,我们调查了CRISPR亚型的特性。该拟议的工具不仅依赖于手动克赖斯注释的知识,还依赖于使用机器学习培训的型号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号