首页> 外文期刊>Bioinformatics >Markers improve clustering of CGH data
【24h】

Markers improve clustering of CGH data

机译:标记可改善CGH数据的聚类

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: We consider the problem of clustering a population of Comparative Genomic Hybridization (CGH) data samples using similarity based clustering methods. A key requirement for clustering is to avoid using the noisy aberrations in the CGH samples. Results: We develop a dynamic programming algorithm to identify a small set of important genomic intervals called markers. The advantage of using these markers is that the potentially noisy genomic intervals are excluded during the clustering process. We also develop two clustering strategies using these markers. The first one, prototype-based approach, maximizes the support for the markers. The second one, similarity-based approach, develops a new similarity measure called RSim and refines clusters with the aim of maximizing the RSim measure between the samples in the same cluster. Our results demonstrate that the markers we found represent the aberration patterns of cancer types well and they improve the quality of clustering significantly. Availability: All software developed in this paper and all the datasets used are available from the authors upon request. Contact: juliu@cise.ufl.edu
机译:动机:我们考虑使用基于相似性的聚类方法对一组比较基因组杂交(CGH)数据样本进行聚类的问题。聚类的关键要求是避免在CGH样本中使用嘈杂的像差。结果:我们开发了一种动态编程算法,以识别一小套重要的基因组区间,称为标记。使用这些标记的优点是在聚类过程中排除了可能有噪声的基因组间隔。我们还使用这些标记物开发了两种聚类策略。第一种方法是基于原型的方法,可以最大程度地支持标记。第二种方法是基于相似度的方法,它开发了一种称为RSim的新相似度度量,并优化了聚类,目的是使同一聚类中样本之间的RSim度量最大化。我们的结果表明,我们发现的标记物很好地代表了癌症类型的畸变模式,并且它们显着提高了聚类的质量。可用性:本文中开发的所有软件和使用的所有数据集均可应要求从作者处获得。联络人:juliu@cise.ufl.edu

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号