Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

Godichon-Baggioni Antoine; Maugis-Rabusseau Cathy; Rau Andrea

首页> 外文期刊>Journal of applied statistics >Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

【24h】

Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

机译：聚类使用K-means转化的组成数据，具有基因表达和自行车共享系统数据的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e. data whose rows belong to the simplex) remains largely unexplored in cases where the observed value is equal or close to zero for one or more samples. This work is motivated by the analysis of two applications, both focused on the categorization of compositional profiles: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib' bicycle sharing system in Paris, France. For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K-means algorithm. We use a non-asymptotic penalized criterion, whose penalty is calibrated with the slope heuristics, to select the number of clusters. Finally, we illustrate the performance of this clustering strategy, which is implemented in the Bioconductor package coseq, on both the gene expression and bicycle sharing system data.

机译：虽然文献中提出的集群算法没有短缺，但是在观察值等于或接近零的情况下，群集组成数据的最相关策略（即，其行属于单纯x的数据）的问题在很大程度上是未开发的一个或多个样本。这项工作是通过对两个应用的分析，既集中于组成谱的分类：（1）从高通量RNA测序数据中鉴定Co表达基因的基团，其中给定基因在一个或一个或中可以完全沉默更实验条件; （2）在法国巴黎的丝状自行车共享系统中，在一周内使用车站的使用模式。 For both of these applications, we make use of appropriately chosen data transformations, including the Centered Log Ratio and a novel extension called the Log Centered Log Ratio, in conjunction with the K-means algorithm.我们使用非渐近惩罚标准，其惩罚与斜坡启发式校准，选择群集数量。最后，我们说明了这种聚类策略的性能，该策略在基因表达和自行车共享系统数据中在Biocuconduct封装COSEQ中实现。

著录项

来源
《Journal of applied statistics》 |2019年第4期|47-65|共19页
作者
Godichon-Baggioni Antoine; Maugis-Rabusseau Cathy; Rau Andrea;
展开▼
作者单位

Univ Toulouse CNRS UMR 5219 Inst Math Toulouse UPS F-31062 Toulouse 9 France;

Univ Toulouse CNRS UMR 5219 Inst Math Toulouse INSA F-31077 Toulouse France;

Univ Paris Saclay AgroParisTech INRA GABI Paris France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering; compositional data; data transformations; K-means;

机译：聚类;组成数据;数据转换;k均值;

相似文献

外文文献
中文文献
专利

1. K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks [J] . Yao Min, Wu Qinghua, Li Juan, SIAM journal on applied dynamical systems . 2017,第2期

机译：K-Walks：使用随机散步优化的K-means聚类算法聚类基因表达数据
2. K-walks: clustering gene-expression data using a K-means clustering algorithm optimised by random walks [J] . Yao Min, Wu Qinghua, Li Juan, International journal of data mining and bioinformatics . 2016,第2期

机译：K-walks：使用随机游走优化的K-means聚类算法对基因表达数据进行聚类
3. PSO-based K-Means clustering with enhanced cluster matching for gene expression data [J] . Yau-King Lam, P. W. M. Tsang, Chi-Sing Leung Neural Computing and Applications . 2013,第7a8期

机译：基于PSO的K-Means聚类，具有增强的基因表达数据聚类匹配功能
4. Comparison between the Applications of Fragment-Based and Vertex-Based GPU Approaches in K-Means Clustering of Time Series Gene Expression Data [C] . Yau-King Lam, Wuchao Situ, P.W.M. Tsang, International conference on neural information processing;ICONIP 2011 . 2011

机译：基于片段和基于顶点的GPU方法在时间序列基因表达数据的K均值聚类中的应用比较
5. K-means clustering with automatic determination of K using a Multiobjective Genetic Algorithm with applications to microarray gene expression data. [D] . Shaw, Matthew Karl Ellis. 2015

机译：使用多目标遗传算法自动确定K值的K均值聚类，并应用于微阵列基因表达数据。
6. Genetic weighted k-means algorithm for clustering large-scale gene expression data [O] . Fang-Xiang Wu 2008

机译：遗传加权k均值算法用于大规模基因表达数据的聚类
7. Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data [O] . Godichon-Baggioni, Antoine, Maugis-Rabusseau, Cathy, Rau, Andrea 2017

机译：使用K-means聚类转换后的成分数据，并将其应用于基因表达和自行车共享系统数据

Clustering transformed compositional data using K-means, with applications in gene expression and bicycle sharing system data

摘要

著录项

相似文献

相关主题

期刊订阅