首页> 外文会议>Cairo International Biomedical Engineering Conference >A Hybrid Machine Learning Approach for the Phenotypic Classification of Metagenomic Colon Cancer Reads Based on Kmer Frequency and Biomarker Profiling
【24h】

A Hybrid Machine Learning Approach for the Phenotypic Classification of Metagenomic Colon Cancer Reads Based on Kmer Frequency and Biomarker Profiling

机译:基于Kmer频率和生物标志物分析的混合机器学习方法用于表型分类的大基因组结肠癌读物

获取原文

摘要

Human Microbiome plays a critical role in health and the environment. Colorectal cancer (CRC) is the most common cause of death in many countries, and hence early diagnosis of CRC may help in increasing the survival rate. Tracking changes in the microbiome structure of human gut opens new gates towards the detection and prediction of the risk of CRC. Recently, machine learning became a powerful technique in many bioinformatics fields, one of which is metagenomics. Metagenomics is defined as the study of a collection of microbial genomes isolated directly and sequenced from its natural habitats. Applications of machine learning in metagenomics are numerous, among them are phenotype classification, taxonomic assignment, and sequence annotation. Phenotype classification is assigning a phenotypic class to each sample such as diseased or healthy, according to the available metadata. Phenotype classification in metagenomics is usually done on organism taxonomic units (OTU) tables extracted as a core step in the metagenomic analysis. On the other hand, Natural Language Processing (NLP) methods such as kmer frequency, can provide features for the machine learning model. In this study, we combined a biomarker profiling and a kmer frequency table approaches to classify colorectal cancer from metagenomic data.
机译:人类微生物组在健康和环境中起着至关重要的作用。在许多国家,大肠癌(CRC)是最常见的死亡原因,因此对CRC进行早期诊断可能有助于提高生存率。追踪人类肠道微生物组结构的变化为检测和预测CRC风险打开了新的大门。最近,机器学习已成为许多生物信息学领域的一项强大技术,其中之一是宏基因组学。元基因组学的定义是对直接从自然栖息地分离并测序的微生物基因组进行研究。机器学习在宏基因组学中的应用非常广泛,其中包括表型分类,分类学分配和序列注释。表型分类是根据可用的元数据为每个样本(例如患病或健康样本)分配一个表型类别。宏基因组学中的表型分类通常是在作为宏基因组学分析的核心步骤而提取的生物分类单元(OTU)表上完成的。另一方面,自然语言处理(NLP)方法(如kmer频率)可以为机器学习模型提供功能。在这项研究中,我们结合了生物标志物分析和kmer频率表方法,从宏基因组学数据对结直肠癌进行了分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号