...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PAcluster: Clustering polyadenylation site data using canonical correlation analysis
【24h】

PAcluster: Clustering polyadenylation site data using canonical correlation analysis

机译:PACLUSTER:使用规范相关分析聚类多腺苷酸化站点数据

获取原文
获取原文并翻译 | 示例

摘要

Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
机译:替代的多腺苷酸化(APA)是一种有助于基因调控的普遍机制。增加测序多(a)站点正在为调查APA调节的计算方法的开发提供新的需求。聚类分析对于鉴定共同表达基因的群体是重要的。然而,在APA中尚未广泛研究聚(a)位点的聚类,其中大多数APA研究未能考虑每个基因中APA位点的分布,丰度和变异。在这里,我们基于规范相关分析(CCA)构建了一种双层模型,探讨APA调节中的潜在生物机制。第一层量化了每种基因之间的各种条件的APA位点的一般相关性,第二层鉴定了在其APA模式上具有统计学显着相关的基因,以推断特定的APA特异性基因簇。使用分层聚类,我们将我们的方法与三个基于三个性能索引的四个其他广泛使用的距离措施进行了全面。结果表明,我们的方法显着提高了合成和真实多(A)现场数据的聚类性能,可以产生具有更多生物学意义的集群。我们已经实现了基于CCA的方法作为称为PACLUSTRUST的公开可用的R包,它为大型APA的生物数据集进行了有效的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号