首页> 外文期刊>Nucleic acids research >Analyzing large biological datasets with association networks
【24h】

Analyzing large biological datasets with association networks

机译:使用关联网络分析大型生物数据集

获取原文
获取外文期刊封面目录资料

摘要

Due to advances in high-throughput biotechnologies biological information is being collected in databases at an amazing rate, requiring novel computational approaches that process collected data into new knowledge in a timely manner. In this study, we propose a computational framework for discovering modular structure, relationships and regularities in complex data. The framework utilizes a semantic-preserving vocabulary to convert records of biological annotations of an object, such as an organism, gene, chemical or sequence, into networks (Anets) of the associated annotations. An association between a pair of annotations in an Anet is determined by the similarity of their co-occurrence pattern with all other annotations in the data. This feature captures associations between annotations that do not necessarily co-occur with each other and facilitates discovery of the most significant relationships in the collected data through clustering and visualization of the Anet. To demonstrate this approach, we applied the framework to the analysis of metadata from the Genomes OnLine Database and produced a biological map of sequenced prokaryotic organisms with three major clusters of metadata that represent pathogens, environmental isolates and plant symbionts.
机译:由于高通量生物技术的进步,生物信息正以惊人的速度被收集到数据库中,这就需要新颖的计算方法来将收集到的数据及时处理为新知识。在这项研究中,我们提出了一个用于发现复杂数据中的模块化结构,关系和规则性的计算框架。该框架利用保留语义的词汇表将对象(例如生物,基因,化学或序列)的生物学注释的记录转换为关联注释的网络(Anets)。 Anet中一对注释之间的关联取决于它们的共现模式与数据中所有其他注释的相似性。此功能捕获不一定相互共存的注释之间的关联,并通过Anet的聚类和可视化促进在收集的数据中发现最重要的关系。为了证明这种方法,我们将该框架应用于了Genomes在线数据库中的元数据分析,并生成了测序的原核生物生物图谱,其中包含代表病原体,环境分离株和植物共生体的三个主要元数据簇。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号