首页> 外文会议> >Clustering binary fingerprint vectors with missing values for DNA array data analysis
【24h】

Clustering binary fingerprint vectors with missing values for DNA array data analysis

机译:聚类具有缺失值的二进制指纹向量以进行DNA阵列数据分析

获取原文
获取外文期刊封面目录资料

摘要

Oligonucleotide fingerprinting is a powerful DNA array based method to characterize cDNA and ribosomal RNA gene (rDNA) libraries and has many applications including gene expression profiling and DNA clone classification. We are especially interested in the latter application. A key step in the method is the cluster analysis of fingerprint data obtained from DNA array hybridization experiments. Most of the existing approaches to clustering use (normalized) real intensity values and thus do not treat positive and negative hybridization signals equally (positive signals are much more emphasized). In this paper, we consider a discrete approach. Fingerprint data are first normalized and binarized using control DNA clones. Because there may exist unresolved (or missing) values in this binarization process, we formulate the clustering of (binary) oligonucleotide fingerprints as a combinatorial optimization problem that attempts to identify clusters and resolve the missing values in the fingerprints simultaneously. We study the computational complexity of this clustering problem and a natural parameterized version, and present an efficient greedy algorithm based on minimum clique partition on graphs. The algorithm takes advantage of some unique properties of the graphs considered here, which allow us to efficiently find the maximum cliques as well as some special maximal cliques. Our experimental results on simulated and real data demonstrate that the algorithm runs faster and performs better than some popular hierarchical and graph-based clustering methods. The results on real data from DNA clone classification also suggest that this discrete approach is more accurate than clustering methods based on real intensity values, in terms of separating clones that have different characteristics with respect to the given oligonucleotide probes.
机译:寡核苷酸指纹图谱是一种功能强大的基于DNA阵列的方法,用于表征cDNA和核糖体RNA基因(rDNA)库,并具有许多应用,包括基因表达谱分析和DNA克隆分类。我们对后一种应用特别感兴趣。该方法的关键步骤是对从DNA阵列杂交实验获得的指纹数据进行聚类分析。现有的大多数聚类方法都使用(规范化的)真实强度值,因此不能平等地对待正和负杂交信号(更加强调正信号)。在本文中,我们考虑了一种离散方法。首先使用对照DNA克隆对指纹数据进行标准化和二值化处理。因为在此二值化过程中可能存在未解决(或缺失)的值,所以我们将(二进制)寡核苷酸指纹的聚类公式化为组合优化问题,试图识别聚类并同时解决指纹中的缺失值。我们研究了该聚类问题和自然参数化版本的计算复杂性,并提出了基于图上最小集团划分的高效贪婪算法。该算法利用了此处考虑的图的某些独特属性,这些属性使我们能够高效地找到最大集团以及一些特殊的最大集团。我们在模拟和真实数据上的实验结果表明,与某些流行的基于层次和基于图的聚类方法相比,该算法运行速度更快且性能更好。来自DNA克隆分类的真实数据结果也表明,就分离具有相对于给定寡核苷酸探针而言不同特性的克隆而言,这种离散方法比基于真实强度值的聚类方法更为准确。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号