首页> 外文期刊>Computational Biology and Bioinformatics, IEEE/ACM Transactions on >Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic Topic Modeling
【24h】

Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic Topic Modeling

机译:通过概率主题建模开发基因组数据的功能和分类结构

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a "document,ȁD; which has a mixture of functional groups, while each functional group (also known as a "latent topicȁD;) is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of "N-merȁD; features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the "N-merȁD; features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.
机译:在本文中,我们提出了一种方法,该方法可同时启用基于同源性的方法和基于成分的方法,以进一步研究功能核心(即微生物核心和基因核心)。在提出的方法中,主要功能组的识别是通过生成主题建模实现的,该主题建模能够从未标记的数据中提取有用的信息。我们首先表明,生成主题模型可用于对通过基于同源性的方法获得的分类单元丰度信息进行建模,并研究微生物核心。该模型将每个样本视为一个“文档”,其中包含官能团的混合物,而每个官能团(也称为“潜在主题”; D)是物种的重量混合物。因此,估计分类生物丰度数据的生成主题模型将揭示每个样本中潜在函数(潜在主题)的分布。其次,我们表明,生成主题模型还可以用于研究“N-merȁD”的基因组水平组成;特征(通过基于组成的方法获得的DNA亚读)。该模型将每个基因组视为混杂的遗传基因模式(潜在主题),而每个功能模式都是“N-merȁD”的加​​权混合;因此,核心基因组的存在可以通过一组常见的N-mer特征来表明。在研究了潜在主题与基因区域之间的相互信息之后,我们对未发现的基因遗传模式的功能作用进行了解释。实验结果证明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号