首页> 外文会议>IEEE International Symposium on Technologies for Homeland Security >Big data biology-based predictive models Via DNA-metagenomics binning for WMD events applications
【24h】

Big data biology-based predictive models Via DNA-metagenomics binning for WMD events applications

机译:基于大数据生物学的预测模型通过DNA-Metagenomics Binning用于WMD事件应用

获取原文
获取外文期刊封面目录资料

摘要

In WMD events or natural disasters, identifying bio-chemicals and microorganisms rapidly is crucial. Metagenomics is the study of microorganisms collected directly from natural environments using whole genome shotgun (WGS) sequencing. Metagenomics methods allow sequencing of organism genomes which cannot be cultured in a laboratory. Grouping random fragments obtained from whole shotgun genome data into groups is known as binning. Metagenomics methods allow quick sequencing of microbes obtained from natural disaster sites to identify microbes and provide rapid and timely response, in terms, for examples, for rapid environment cleanup/restoration, rapid quarantine of objects/animals/humans, recovery, etc. In this paper we propose machine learning related predictive DNA sequence feature selection algorithms to solve binning problems in more accurate and efficient ways. Here we use sub-sequences blocks extracted from organism protein domains as features. We analyze and compare binning prediction results obtained by using k-mers, by using codons, and by using sub-sequences blocks derived from conserved protein domains. We show here, that sub-sequences blocks derived from conserved protein domains give better prediction accuracy than k-mers or codons. We also showed comparative analysis of binning predictive models using Nai?ve Bayes Classifier and Random Forest Classifier with feature set derived from conserved protein domain. Our analysis shows that using the Random Forest classifier, results in better classification accuracy than using the Nai?ve Bayes classifier.
机译:在WMD事件或自然灾害中,识别生物化学品和微生物迅速至关重要。 Metagenomics是使用全基因组霰弹枪(WGS)测序直接从自然环境收集的微生物的研究。 Metagenomics方法允许在实验室中培养的生物基因组测序。将从整个霰弹枪基因组数据中获得的随机片段分成组被称为分箱。 MetageNomics方法允许快速测序从天然灾害站点获得的微生物以识别微生物,并以术语为例,提供快速和及时的响应,以便进行快速环境清理/恢复,物体/动物/人类,恢复等的快速检疫纸张我们提出了机器学习相关的预测DNA序列特征选择算法,以更准确和有效的方式解决融合问题。在这里,我们使用从生物蛋白域中提取的子序列块作为特征。我们通过使用密码子和使用来自保守蛋白质结构域的子序列块来分析并比较通过使用K-MERS获得的分衬预测结果。我们在此示出,该源自保守蛋白质结构域的子序列块提供比K-MERS或密码子更好的预测精度。我们还展示了使用Nai ve Bayes Classifier和随机森林分类器的融合预测模型的比较分析,具有源自保守蛋白质领域的特征集。我们的分析表明,使用随机林分类器,比使用Nai ve Bayes分类器的更好的分类精度导致更好的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号