首页> 外文期刊>Processes >A Machine Learning-based Pipeline for the Classification of CTX-M in Metagenomics Samples
【24h】

A Machine Learning-based Pipeline for the Classification of CTX-M in Metagenomics Samples

机译:基于机器学习的管道,用于在元基因组学样本中对CTX-M进行分类

获取原文
           

摘要

Bacterial infections are a major global concern, since they can lead to public health problems. To address this issue, bioinformatics contributes extensively with the analysis and interpretation of in silico data by enabling to genetically characterize different individuals/strains, such as in bacteria. However, the growing volume of metagenomic data requires new infrastructure, technologies, and methodologies that support the analysis and prediction of this information from a clinical point of view, as intended in this work. On the other hand, distributed computational environments allow the management of these large volumes of data, due to significant advances in processing architectures, such as multicore CPU (Central Process Unit) and GPGPU (General Propose Graphics Process Unit). For this purpose, we developed a bioinformatics workflow based on filtered metagenomic data with Duk tool. Data formatting was done through Emboss software and a prototype of a workflow. A pipeline was also designed and implemented in bash script based on machine learning. Further, Python 3 programming language was used to normalize the training data of the artificial neural network, which was implemented in the TensorFlow framework, and its behavior was visualized in TensorBoard. Finally, the values from the initial bioinformatics process and the data generated during the parameterization and optimization of the Artificial Neural Network are presented and validated based on the most optimal result for the identification of the CTX-M gene group.
机译:细菌感染是全球主要关注的问题,因为它们会导致公共卫生问题。为了解决这个问题,生物信息学通过能够对细菌中的不同个体/菌株进行遗传表征,为计算机数据的分析和解释做出了巨大贡献。但是,宏基因组数据的数量不断增长,因此需要新的基础架构,技术和方法,以支持从临床角度对信息进行分析和预测的目的,这正是本研究的目的。另一方面,由于处理架构(例如多核CPU(中央处理单元)和GPGPU(通用提议图形处理单元))的显着进步,分布式计算环境允许管理这些大量数据。为此,我们使用Duk工具基于过滤的宏基因组数据开发了生物信息学工作流程。数据格式化是通过Emboss软件和工作流程原型完成的。还基于机器学习以bash脚本设计和实现了管道。此外,使用Python 3编程语言对在TensorFlow框架中实现的人工神经网络的训练数据进行规范化,并在TensorBoard中可视化其行为。最后,基于用于识别CTX-M基因组的最佳结果,给出并验证了来自初始生物信息学过程的值以及在人工神经网络的参数化和优化过程中生成的数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号