首页> 美国卫生研究院文献>Cancer Informatics >HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree
【2h】

HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree

机译:HCsnip:用于分层聚类树的半监督剪裁的R包

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package >HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, >HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that “haunted” high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, >HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can be combined with other algorithms that operate on detected clusters.
机译:层次聚类(HC)是分析高维基因组数据的计算生物学中最常用的方法之一。给定一个数据集,HC输出一个二叉树的叶子作为数据点,内部节点代表各种大小的簇。通常,在HC树上选择固定高度的切口,并将低于该高度的数据点的每个连续分支视为一个单独的群集。但是,在人们期望带有嵌套簇的复杂树结构的情况下,固定高度的分支切口可能不是理想的选择。此外,由于在选择临界值时缺乏相关背景信息的利用,通常难以解释诱发的星团。本文介绍了一种新颖的过程,该过程旨在以半监督方式自动从HC树中提取有意义的簇。该程序在Bioconductor的R包> HCsnip 中实现。 > HCsnip 不是以固定高度切割HC树,而是探查可能在不同高度处截取的各种方式,以挖掘隐藏在树深处的隐藏簇。群集提取过程与从中导出HC树的数据集一起使用了常用的背景信息。因此,针对“困扰”高维基因组数据的各种变异来源,提取出的簇具有很高的重现性和鲁棒性。由于聚类过程由背景信息指导,因此聚类易于解释。与现有程序包不同,在需要群集的数据类型上没有任何限制。特别是,程序包接受患者随访数据以指导簇提取过程。据我们所知,> HCsnip 是第一个能够在患者事件发生时间信息的指导下通过分段截取将HC树分解为簇的程序包。我们对半监督HC树剪切框架的实现是通用的,并且可以与在检测到的集群上运行的其他算法结合使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号