首页> 外文期刊>International journal of computational models and algorithms in medicine. >Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets
【24h】

Privacy Preserving Principal Component Analysis Clustering for Distributed Heterogeneous Gene Expression Datasets

机译:分布式异构基因表达数据集的隐私保护主成分分析聚类

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present approaches to perform principal component analysis (PCA) clustering for distributed heterogeneous genomic datasets with privacy protection. The approaches allow data providers to collaborate together to identify gene profiles from a global viewpoint, and at the same time, protect the sensitive genomic data from possible privacy leaks. We then further develop a framework for privacy preserving PCA-based gene clustering, which includes two types ofparticipants: data providers and a trusted central site (TCS). Two different methodologies are employed: Collective PCA (C-PCA) and Repeating PCA (R-PCA). The C-PCA requires local sites to transmit a sample of original data to the TCS and can be applied to any heterogeneous datasets. The R-PCA approach requires all local sites have the same or similar number of columns, but releases no original data. Experiments on five independent genomic datasets show that both C-PCA and R-PCA approaches maintain very good accuracy compared with the centralized scenario.
机译:在本文中,我们介绍了对具有隐私保护功能的分布式异构基因组数据集执行主成分分析(PCA)聚类的方法。这些方法使数据提供者可以共同协作,从全局的角度识别基因概况,同时保护敏感的基因组数据免受可能的隐私泄露。然后,我们进一步开发了一个用于基于PCA的隐私保护基因聚类的框架,该框架包括两种类型的参与者:数据提供者和受信任的中心站点(TCS)。使用两种不同的方法:集体PCA(C-PCA)和重复PCA(R-PCA)。 C-PCA要求本地站点将原始数据的样本传输到TCS,并且可以应用于任何异构数据集。 R-PCA方法要求所有本地站点具有相同或相似数量的列,但不释放原始数据。在五个独立的基因组数据集上进行的实验表明,与集中式方案相比,C-PCA和R-PCA方法都保持了非常好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号