首页> 外文期刊>BMC Genomics >A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment
【24h】

A supervised learning approach for taxonomic classification of core-photosystem-II genes and transcripts in the marine environment

机译:海洋环境中核心-光系统-II基因和转录本分类分类的有监督学习方法

获取原文
获取外文期刊封面目录资料

摘要

Background Cyanobacteria of the genera Synechococcus and Prochlorococcus play a key role in marine photosynthesis, which contributes to the global carbon cycle and to the world oxygen supply. Recently, genes encoding the photosystem II reaction center (psbA and psbD) were found in cyanophage genomes. This phenomenon suggested that the horizontal transfer of these genes may be involved in increasing phage fitness. To date, a very small percentage of marine bacteria and phages has been cultured. Thus, mapping genomic data extracted directly from the environment to its taxonomic origin is necessary for a better understanding of phage-host relationships and dynamics. Results To achieve an accurate and rapid taxonomic classification, we employed a computational approach combining a multi-class Support Vector Machine (SVM) with a codon usage position specific scoring matrix (cuPSSM). Our method has been applied successfully to classify core-photosystem-II gene fragments, including partial sequences coming directly from the ocean, to seven different taxonomic classes. Applying the method on a large set of DNA and RNA psbA clones from the Mediterranean Sea, we studied the distribution of cyanobacterial psbA genes and transcripts in their natural environment. Using our approach, we were able to simultaneously examine taxonomic and ecological distributions in the marine environment. Conclusion The ability to accurately classify the origin of individual genes and transcripts coming directly from the environment is of great importance in studying marine ecology. The classification method presented in this paper could be applied further to classify other genes amplified from the environment, for which training data is available.
机译:背景Synechococcus和Prochlorococcus属的蓝细菌在海洋光合作用中起着关键作用,这有助于全球碳循环和世界氧气供应。最近,在巨噬细胞基因组中发现了编码光系统II反应中心的基因(psbA和psbD)。这种现象表明这些基因的水平转移可能与增加噬菌体适应性有关。迄今为止,已经培养了很小百分比的海洋细菌和噬菌体。因此,为了更好地了解噬菌体与宿主的关系和动力学,必须将直接从环境中提取的基因组数据映射到其分类学来源。结果为了实现准确,快速的分类分类,我们采用了一种将多类支持向量机(SVM)与密码子使用位置特定评分矩阵(cuPSSM)相结合的计算方法。我们的方法已成功应用于将核心光系统II基因片段(包括直接来自海洋的部分序列)分类为七个不同的分类学类别。将这种方法应用于来自地中海的大量DNA和RNA psbA克隆中,我们研究了蓝细菌psbA基因和转录本在其自然环境中的分布。使用我们的方法,我们能够同时检查海洋环境中的分类学和生态学分布。结论准确分类直接来自环境的单个基因和转录本的来源的能力在研究海洋生态学中具有重要意义。本文提出的分类方法可以进一步应用于对从环境中扩增出的其他基因进行分类,而这些数据可以从中得到训练数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号