首页> 美国卫生研究院文献>PLoS Clinical Trials >Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis
【2h】

Sequence-based multiscale modeling for high-throughput chromosome conformation capture (Hi-C) data analysis

机译:基于序列的多尺度建模用于高通量染色体构象捕获(Hi-C)数据分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we introduce sequence-based multiscale modeling for biomolecular data analysis. We employ spectral clustering method in our modeling and reveal the difference between sequence-based global scale clustering and local scale clustering. Essentially, two types of distances, i.e., Euclidean (or spatial) distance and genomic (or sequential) distance, can be used in data clustering. Clusters from sequence-based global scale models optimize spatial distances, meaning spatially adjacent loci are more likely to be assigned into the same cluster. Sequence-based local scale models, on the other hand, result in clusters that optimize genomic distances. That is to say, in these models, sequentially adjoining loci tend to be cluster together. We propose two sequence-based multiscale models (SeqMMs) for the study of chromosome hierarchical structures, including genomic compartments and topological associated domains (TADs). We find that genomic compartments are determined only by global scale information in the Hi-C data. The removal of all the local interactions within a band region as large as 10 Mb in genomic distance has almost no significant influence on the final compartment results. Further, in TAD analysis, we find that when the sequential scale is small, a tiny variation of diagonal band region in a contact map will result in a great change in the predicted TAD boundaries. When the scale value is larger than a threshold value, the TAD boundaries become very consistent. This threshold value is highly related to TAD sizes. By the comparison of our results with those previously obtained using a spectral clustering model, we find that our method is more robust and reliable. Finally, we demonstrate that almost all TAD boundaries from both clustering methods are local minimum of a TAD summation function.
机译:在本文中,我们介绍了用于生物分子数据分析的基于序列的多尺度建模。我们在建模中采用频谱聚类方法,揭示了基于序列的全局尺度聚类和局部尺度聚类之间的区别。本质上,可以在数据聚类中使用两种类型的距离,即欧氏距离(或空间距离)和基因组距离(或顺序距离)。来自基于序列的全局比例模型的聚类可优化空间距离,这意味着空间上相邻的基因座更有可能被分配到同一聚类中。另一方面,基于序列的局部尺度模型会产生可优化基因组距离的簇。也就是说,在这些模型中,顺序相邻的基因座往往会聚在一起。我们提出了两个基于序列的多尺度模型(SeqMM),用于研究染色体层次结构,包括基因组区室和拓扑相关域(TAD)。我们发现,基因组区室仅由Hi-C数据中的全球规模信息确定。在基因组距离高达10 Mb的条带区域内,所有局部相互作用的去除对最终区室结果几乎没有显着影响。此外,在TAD分析中,我们发现,当顺序比例较小时,接触图中对角带区域的微小变化将导致预测的TAD边界发生较大变化。当比例值大于阈值时,TAD边界变得非常一致。此阈值与TAD大小高度相关。通过将我们的结果与以前使用频谱聚类模型获得的结果进行比较,我们发现我们的方法更加健壮和可靠。最后,我们证明了来自两种聚类方法的几乎所有TAD边界都是TAD求和函数的局部最小值。

著录项

  • 期刊名称 PLoS Clinical Trials
  • 作者

    Kelin Xia;

  • 作者单位
  • 年(卷),期 2012(13),2
  • 年度 2012
  • 页码 e0191899
  • 总页数 16
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号