首页> 外文期刊>BMC Bioinformatics >Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization
【24h】

Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization

机译:基于基因序列的长距离染色质相互作用的预测表明短串联重复序列在基因组组织中的潜在作用

获取原文
           

摘要

Background Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist approaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between specific genomic regions — the promoters and enhancers, neglecting other possibilities, for instance, the so-called structural interactions involving intervening chromatin. Results We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to predict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific support vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The method shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC) curve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did not have sufficient candidate interaction partners for model training, we employed multitask learning to share knowledge between models of different loci. In this scenario, across the three cell lines, the method attained an average performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding prediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on average. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence signals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence signals suggests a potential general role of short tandem repeat sequences in genome organization. Conclusions We demonstrated how our approach can 1) provide insights into sequence features of locus-specific interaction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat sequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role in genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level, chromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study regions omitted from existing prediction approaches using various information sources (e.g., epigenetic information); and (c) improve methods that predict the 3D structure of the chromatin.
机译:背景技术了解染色质的三维(3D)结构对于获得完整的监管图景非常重要。 3D结构的变化与疾病有关。尽管存在尝试预测长距离染色质相互作用的方法,但它们仅关注特定基因组区域之间的相互作用,即启动子和增强子,而忽略了其他可能性,例如涉及介入染色质的所谓结构相互作用。结果我们提出了一种方法,该方法可以使用候选基因座的基因序列在5C数据上进行训练,以预测特定目标基因座的潜在全基因组相互作用伙伴。我们使用低聚物距离直方图(ODH)表示建立了基于特定位置支持向量机(SVM)的预测变量。该方法在跨GM12878,K562和HeLa-S3细胞系的各个区域的平均测试AUC(接收器工作特性(ROC)曲线下的面积)为0.7或更高的情况下显示出良好的性能。在任何地点没有足够的候选交互伙伴进行模型训练的情况下,我们采用多任务学习在不同基因座的模型之间共享知识。在这种情况下,在这三种细胞系中,该方法在AUC中的平均性能提高了0.09。关于在独立的高分辨率Hi-C数据集上进行预测的基于5C数据训练的模型的性能评估(这是一个相当困难的问题)平均显示为0.56 AUC。此外,我们开发了新的,直观的可视化方法,可以解释有助于预测特定基因座相互作用伙伴的序列信号。这些序列信号的分析表明短串联重复序列在基因组组织中的潜在一般作用。结论我们证明了我们的方法可以如何:1)提供对基因座特异性相互作用伴侣的序列特征的洞察力,以及2)还确定它们的细胞系特异性。我们的模型认为短串联重复序列可作为预测潜在相互作用伙伴的判据,表明它们可能在基因组组织中发挥更大的作用。因此,我们的方法(a)有益于在序列水平上广泛地理解染色质相互作用和诸如(元)拓扑关联域(TAD)等高阶结构; (b)使用各种信息源(例如表观遗传信息)从现有预测方法中省略的研究区域; (c)改进预测染色质3D结构的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号