...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PAcluster: Clustering polyadenylation site data using canonical correlation analysis
【24h】

PAcluster: Clustering polyadenylation site data using canonical correlation analysis

机译:PACLUSTER:使用规范相关分析聚类多腺苷酸化站点数据

获取原文
获取原文并翻译 | 示例

摘要

Alternative polyadenylation (APA) is a pervasive mechanism that contributes to gene regulation. Increasing sequenced poly(A) sites are placing new demands for the development of computational methods to investigate APA regulation. Cluster analysis is important to identify groups of co-expressed genes. However, clustering of poly(A) sites has not been extensively studied in APA, where most APA studies failed to consider the distribution, abundance, and variation of APA sites in each gene. Here we constructed a two-layer model based on canonical correlation analysis (CCA) to explore the underlying biological mechanisms in APA regulation. The first layer quantifies the general correlation of APA sites across various conditions between each gene and the second layer identifies genes with statistically significant correlation on their APA patterns to infer APA-specific gene clusters. Using hierarchical clustering, we comprehensively compared our method with four other widely used distance measures based on three performance indexes. Results showed that our method significantly enhanced the clustering performance for both synthetic and real poly(A) site data and could generate clusters with more biological meaning. We have implemented the CCA-based method as a publically available R package called PAcluster, which provides an efficient solution to the clustering of large APA-specific biological dataset.
机译:选择性聚腺苷酸化(APA)是一种普遍存在的基因调控机制。越来越多的测序聚(A)位点对研究APA调节的计算方法提出了新的要求。聚类分析对于确定共表达基因组很重要。然而,聚(A)位点的聚类在APA尚未被广泛研究,其中大多数APA研究未能考虑每个基因中APA位点的分布、丰度和变异。在这里,我们构建了一个基于典型相关分析(CCA)的两层模型,以探索APA调节的潜在生物学机制。第一层量化每个基因之间不同条件下APA位点的一般相关性,第二层识别在APA模式上具有统计显著相关性的基因,以推断APA特异性基因簇。利用层次聚类,我们基于三个性能指标将我们的方法与其他四种广泛使用的距离度量进行了综合比较。结果表明,我们的方法显著提高了合成和真实poly(A)位点数据的聚类性能,并且可以生成具有更多生物学意义的聚类。我们已经将基于CCA的方法作为一个名为PAcluster的公开R包实现,它为大型APA特定生物数据集的聚类提供了一个有效的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号