...
首页> 外文期刊>Water Research >Predicting the performance of anaerobic digestion using machine learning algorithms and genomic data
【24h】

Predicting the performance of anaerobic digestion using machine learning algorithms and genomic data

机译:使用机器学习算法和基因组数据预测厌氧消化的性能

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Modeling of anaerobic digestion (AD) is crucial to better understand the process dynamics and to improve the digester performance. This is an essential yet difficult task due to the complex and unknown inter-actions within the system. The application of well-developed data mining technologies, such as machine learning (ML) and microbial gene sequencing techniques are promising in overcoming these challenges. In this study, we investigated the feasibility of 6 ML algorithms using genomic data and their corresponding operational parameters from 8 research groups to predict methane yield. For classification models, ran-dom forest (RF) achieved accuracies of 0.77 using operational parameters alone and 0.78 using genomic data at the bacterial phylum level alone. The combination of operational parameters and genomic data improved the prediction accuracy to 0.82 ( p 0.05). For regression models, a low root mean square error of 0.04 (relative root mean square error = 8.6%) was acquired by neural network using genomic data at the bacterial phylum level alone. Feature importance analysis by RF suggested that Chloroflexi, Actinobacteria, Proteobacteria, Fibrobacteres, and Spirochaeta were the top 5 most important phyla although their relative abundances were ranging only from 0.1% to 3.1%. The important features identified could provide guid-ance for early warning and proactive management of microbial communities. This study demonstrated the promising application of ML techniques for predicting and controlling AD performance.(c) 2021 Elsevier Ltd. All rights reserved.
机译:厌氧消化(AD)的建模至关重要,以更好地了解过程动态并改善蒸煮性能。这是由于系统内的复杂性和未知的互动,这是一个必不可少的艰巨的任务。在克服这些挑战时,良好开发的数据挖掘技术(如机器学习(ML)和微生物基因测序技术的应用也是很有希望的。在这项研究中,我们研究了使用基因组数据的6mL算法及其来自8个研究组的相应操作参数来预测甲烷产量的可行性。对于分类模型,RAN-DOM森林(RF)使用单独使用的操作参数和0.78使用单独的基因组数据来实现0.77的精度。操作参数和基因组数据的组合将预测精度提高至0.82(P <0.05)。对于回归模型,通过单独使用基因组数据,神经网络获得了0.04(相对根均方误差= 8.6%)的低根均方误差。 RF的特征重要性分析表明,氯昔上肌菌,肌菌细菌,纤维菌,纤维杆菌和螺旋体是前5个最重要的植物,尽管它们的相对丰度仅为0.1%至3.1%。确定的重要特征可以为微生物社区的预警和主动管理提供GUID-ANCE。本研究证明了ML技术的有望应用预测和控制广告性能。(c)2021 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号