首页> 外文会议>IEEE International Symposium on Technologies for Homeland Security >Big data biology-based predictive models Via DNA-metagenomics binning for WMD events applications
【24h】

Big data biology-based predictive models Via DNA-metagenomics binning for WMD events applications

机译:通过大分子生物学分类的基于大数据生物学的预测模型用于WMD事件应用

获取原文

摘要

In WMD events or natural disasters, identifying bio-chemicals and microorganisms rapidly is crucial. Metagenomics is the study of microorganisms collected directly from natural environments using whole genome shotgun (WGS) sequencing. Metagenomics methods allow sequencing of organism genomes which cannot be cultured in a laboratory. Grouping random fragments obtained from whole shotgun genome data into groups is known as binning. Metagenomics methods allow quick sequencing of microbes obtained from natural disaster sites to identify microbes and provide rapid and timely response, in terms, for examples, for rapid environment cleanup/restoration, rapid quarantine of objects/animals/humans, recovery, etc. In this paper we propose machine learning related predictive DNA sequence feature selection algorithms to solve binning problems in more accurate and efficient ways. Here we use sub-sequences blocks extracted from organism protein domains as features. We analyze and compare binning prediction results obtained by using k-mers, by using codons, and by using sub-sequences blocks derived from conserved protein domains. We show here, that sub-sequences blocks derived from conserved protein domains give better prediction accuracy than k-mers or codons. We also showed comparative analysis of binning predictive models using Naïve Bayes Classifier and Random Forest Classifier with feature set derived from conserved protein domain. Our analysis shows that using the Random Forest classifier, results in better classification accuracy than using the Naïve Bayes classifier.
机译:在大规模杀伤性武器事件或自然灾害中,迅速识别生物化学物质和微生物至关重要。 Metagenomics是一项使用全基因组shot弹枪(WGS)测序直接从自然环境中收集的微生物的研究。元基因组学方法可以对无法在实验室中培养的生物基因组进行测序。从散弹枪全基因组数据中获得的随机片段被分组,这被称为分箱。元基因组学方法可以对从自然灾害现场获得的微生物进行快速测序,以识别微生物并提供快速及时的响应,例如,快速的环境清理/恢复,物体/动物/人类的快速检疫,恢复等。本文我们提出了与机器学习相关的预测性DNA序列特征选择算法,以更准确和有效的方式解决装箱问题。在这里,我们使用从生物蛋白质结构域提取的子序列块作为特征。我们分析和比较通过使用k-mers,通过使用密码子和通过使用从保守的蛋白结构域衍生的子序列模块获得的装仓预测结果。我们在这里显示,从保守的蛋白质结构域衍生的子序列区块比k-mers或密码子具有更好的预测准确性。我们还显示了使用朴素贝叶斯分类器和随机森林分类器的分箱预测模型的比较分析,其特征集来自保守的蛋白质结构域。我们的分析表明,与使用朴素贝叶斯分类器相比,使用随机森林分类器可带来更好的分类准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号