首页> 外文期刊>BMC Bioinformatics >A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data
【24h】

A new unsupervised gene clustering algorithm based on the integration of biological knowledge into expression data

机译:一种基于生物学知识集成到表达数据的新无监督基因聚类算法

获取原文
       

摘要

Background Gene clustering algorithms are massively used by biologists when analysing omics data. Classical gene clustering strategies are based on the use of expression data only, directly as in Heatmaps, or indirectly as in clustering based on coexpression networks for instance. However, the classical strategies may not be sufficient to bring out all potential relationships amongst genes. Results We propose a new unsupervised gene clustering algorithm based on the integration of external biological knowledge, such as Gene Ontology annotations, into expression data. We introduce a new distance between genes which consists in integrating biological knowledge into the analysis of expression data. Therefore, two genes are close if they have both similar expression profiles and similar functional profiles at once. Then a classical algorithm (e.g. K-means) is used to obtain gene clusters. In addition, we propose an automatic evaluation procedure of gene clusters. This procedure is based on two indicators which measure the global coexpression and biological homogeneity of gene clusters. They are associated with hypothesis testing which allows to complement each indicator with a p-value. Our clustering algorithm is compared to the Heatmap clustering and the clustering based on gene coexpression network, both on simulated and real data. In both cases, it outperforms the other methodologies as it provides the highest proportion of significantly coexpressed and biologically homogeneous gene clusters, which are good candidates for interpretation. Conclusion Our new clustering algorithm provides a higher proportion of good candidates for interpretation. Therefore, we expect the interpretation of these clusters to help biologists to formulate new hypothesis on the relationships amongst genes.
机译:在分析OMICS数据时,生物学家是批量使用的背景基因聚类算法。古典基因聚类策略基于仅在热线中使用表达数据,或者间接地基于基于共存网络的聚类。然而,经典的策略可能不足以在基因中带出所有潜在的关系。结果我们提出了一种基于外部生物知识(如基因本体注释)的整合到表达数据中的新无调节基因聚类算法。我们在基因之间引入了新的距离,这包括将生物知识集成到表达数据的分析中。因此,如果它们具有类似的表达谱和类似的功能谱,则两个基因接近。然后使用经典算法(例如K-Means)来获得基因簇。此外,我们提出了一种基因簇的自动评估程序。该程序基于两种指标,该指标测量基因簇的全局共表达和生物均匀性。它们与假设检测相关联,这允许将每个指示器与p值补充。我们在模拟和实际数据上将我们的聚类算法与基于基因共抑制网络的热线映射聚类和聚类进行比较。在这两种情况下,它优于其他方法,因为它提供了最高比例的显着共识和生物均匀基因集群,这是解释的良好候选者。结论我们的新集群算法提供了更高比例的解释候选人。因此,我们预计对这些集群的解释,以帮助生物学家在基因之间的关系上制定新的假设。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号