首页> 外文期刊>Nucleic acids research >DNA sequence models of genome-wide Drosophila?melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements
【24h】

DNA sequence models of genome-wide Drosophila?melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements

机译:全果蝇果蝇Polycomb结合位点的DNA序列模型提高了对独立Polycomb反应元件的普遍性

获取原文
获取外文期刊封面目录资料

摘要

Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.
机译:聚梳响应元件(PRE)是顺式调节DNA元件,可通过DNA复制和有丝分裂维持基因转录状态。 PRE具有很少的序列相似性,但是富含许多序列基序。用于建模果蝇PRE序列的先前方法(PREdictor和EpiPredictor)使用了一组7个基序和一组训练的12个PRE和16-23个非PRE。映射染色质结合因子和修饰的实验方法的进步导致发布了几套全基因组的Polycomb靶标。除了先前使用的七个基序以外,PRE还富含GTGT基序,最近与序列特异性DNA结合蛋白Combgap相关。我们调查了用纯朴的PRE模型产生的控制序列(包括GTGT基序)进行训练时,在全基因组Polycomb位点上训练的模型是否能推广到独立的PRE。我们还开发了一种新的PRE预测因子:SVM-MOCCA。用全基因组实验数据训练PRE预测因子可以提高对独立数据的泛化能力,SVM-MOCCA可以预测三个独立实验集中的大部分PRE。我们目前在序列和染色质签名中丰富了2908个候选PRE。这些中的2412还富含H3K4me1,这是Trithorax激活的染色质的标记,表明PRE / TRE具有共同的序列代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号