首页> 外文期刊>Plant and cell physiology >Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK Motorcycle database. (Special Focus Issue: Phytochemical genomics.)
【24h】

Systematization of the protein sequence diversity in enzymes related to secondary metabolic pathways in plants, in the context of big data biology inspired by the KNApSAcK Motorcycle database. (Special Focus Issue: Phytochemical genomics.)

机译:在KNApSAcK摩托车数据库的启发下,在大数据生物学的背景下,与植物次级代谢途径相关的酶中蛋白质序列多样性的系统化。 (特别关注的话题:植物化学基因组学。)

获取原文
获取原文并翻译 | 示例
           

摘要

Biology is increasingly becoming a data-intensive science with the recent progress of the omics fields, e.g. genomics, transcriptomics, proteomics and metabolomics. The species-metabolite relationship database, KNApSAcK Core, has been widely utilized and cited in metabolomics research, and chronological analysis of that research work has helped to reveal recent trends in metabolomics research. To meet the needs of these trends, the KNApSAcK database has been extended by incorporating a secondary metabolic pathway database called Motorcycle DB. We examined the enzyme sequence diversity related to secondary metabolism by means of batch-learning self-organizing maps (BL-SOMs). Initially, we constructed a map by using a big data matrix consisting of the frequencies of all possible dipeptides in the protein sequence segments of plants and bacteria. The enzyme sequence diversity of the secondary metabolic pathways was examined by identifying clusters of segments associated with certain enzyme groups in the resulting map. The extent of diversity of 15 secondary metabolic enzyme groups is discussed. Data-intensive approaches such as BL-SOM applied to big data matrices are needed for systematizing protein sequences. Handling big data has become an inevitable part of biology.Digital Object Identifier http://dx.doi.org/10.1093/pcp/pct041
机译:随着组学领域的最新进展,例如生物学,生物正越来越成为数据密集型科学。基因组学,转录组学,蛋白质组学和代谢组学。物种-代谢物关系数据库KNApSAcK Core已在代谢组学研究中得到广泛利用和引用,并且对该研究工作的时间顺序分析有助于揭示代谢组学研究的最新趋势。为了满足这些趋势的需求,通过合并名为Motorcycle DB的辅助代谢途径数据库来扩展KNApSAcK数据库。我们通过分批学习自组织图谱(BL-SOMs)检查了与次级代谢有关的酶序列多样性。最初,我们使用大数据矩阵构建了一个图,该矩阵由植物和细菌的蛋白质序列片段中所有可能的二肽的频率组成。通过在所得图谱中鉴定与某些酶基团相关的节段簇,检查了次级代谢途径的酶序列多样性。讨论了15个次级代谢酶组的多样性程度。需要将数据密集型方法(例如BL-SOM)应用于大数据矩阵来系统化蛋白质序列。处理大数据已成为生物学的必然部分。数字对象标识符http://dx.doi.org/10.1093/pcp/pct041

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号