首页> 外文会议>IEEE International Conference on Computational Advances in Bio and Medical Sciences >Keynote: High-resolution sequence and chromatin signatures predict transcription factor binding in the human genome
【24h】

Keynote: High-resolution sequence and chromatin signatures predict transcription factor binding in the human genome

机译:主题:高分辨率序列和染色质签名预测人类基因组中的转录因子结合

获取原文

摘要

Accurately modeling the DNA sequence preferences of transcription factors and predicting their genomic binding sites are key problems in regulatory genomics. These efforts have long been frustrated by the limited availability and accuracy of TF binding site motifs. Today, protein binding microarray (PBM) experiments and chromatin immunoprecipitation followed by sequencing (ChlP-seq) experiments are generating unprecedented high-resolution data on in vitro and in vivo TF binding. This paper will present a flexible new discriminative framework for representing and learning TF binding preferences using these massive data sets. Support vector regressions models were trained with a novel string kernel on PBM data to learn the mapping from probe sequences to binding intensities. Results confirm that discriminative sequence models presented here significantly outperform existing motif discovery algorithms, and it is found that ChlP-trained models greatly improved TF occupancy prediction over PBM-trained models, suggesting distinct in vivo sequence information. Finally, discriminative chromatin models using histone modification ChlP-seq data were trained and results show that models combining sequence and chromatin signatures strongly outperformed using either one alone. This work establishes effective new techniques for analyzing next generation sequencing data sets to study the interplay of chromatin and sequence in TF binding in the human genome.
机译:准确地建模转录因子的DNA序列偏好并预测其基因组结合位点是调节基因组学的关键问题。这些努力长期以来,通过TF结合位点图案的可用性和准确性很长一直受挫。如今,蛋白质结合微阵列(PBM)实验和染色质免疫沉淀,然后进行测序(CHLP-SEQ)实验在体外和体内含有前所未有的高分辨率数据和体内TF结合。本文将使用这些大规模数据集表示灵活的新判别框架,用于代表和学习TF绑定偏好。支持向量回归模型在PBM数据上用新颖的串核训练,以从探测序列到绑定强度的映射。结果证实,这里提出的识别序列模型显着优于现有的基序发现算法,并发现CHLP培训的型号大大改善了PBM训练型模型的TF占用预测,表明在体内序列信息中截然不同。最后,用组蛋白修饰ChlP-seq的数据辨别染色质模型训练和结果表明,模型结合使用强跑赢序列和染色质签名的任何一个人单独。该工作建立了用于分析下一代测序数据集的有效新技术,以研究人类基因组中TF结合中的染色质和序列的相互作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号