首页> 美国卫生研究院文献>Human Genomics >A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease
【2h】

A general integrative genomic feature transcription factor binding site prediction method applied to analysis of USF1 binding in cardiovascular disease

机译:一种通用的综合基因组特征转录因子结合位点预测方法用于分析心血管疾病中USF1的结合

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Transcription factors are key mediators of human complex disease processes. Identifying the target genes of transcription factors will increase our understanding of the biological network leading to disease risk. The prediction of transcription factor binding sites (TFBSs) is one method to identify these target genes; however, current prediction methods need improvement. We chose the transcription factor upstream stimulatory factor l (USF1) to evaluate the performance of our novel TFBS prediction method because of its known genetic association with coronary artery disease (CAD) and the recent availability of USF1 chromatin immunoprecipitation microarray (ChIP-chip) results. The specific goals of our study were to develop a novel and accurate genome-scale method for predicting USF1 binding sites and associated target genes to aid in the study of CAD. Previously published USF1 ChIP-chip data for 1 per cent of the genome were used to develop and evaluate several kernel logistic regression prediction models. A combination of genomic features (phylogenetic conservation, regulatory potential, presence of a CpG island and DNaseI hypersensitivity), as well as position weight matrix (PWM) scores, were used as variables for these models. Our most accurate predictor achieved an area under the receiver operator characteristic curve of 0.827 during cross-validation experiments, significantly outperforming standard PWM-based prediction methods. When applied to the whole human genome, we predicted 24,010 USF1 binding sites within 5 kilobases upstream of the transcription start site of 9,721 genes. These predictions included 16 of 20 genes with strong evidence of USF1 regulation. Finally, in the spirit of genomic convergence, we integrated independent experimental CAD data with these USF1 binding site prediction results to develop a prioritised set of candidate genes for future CAD studies. We have shown that our novel prediction method, which employs genomic features related to the presence of regulatory elements, enables more accurate and efficient prediction of USF1 binding sites. This method can be extended to other transcription factors identified in human disease studies to help further our understanding of the biology of complex disease.
机译:转录因子是人类复杂疾病过程的关键介质。鉴定转录因子的靶基因将增加我们对导致疾病风险的生物网络的了解。转录因子结合位点(TFBSs)的预测是鉴定这些靶基因的一种方法。但是,当前的预测方法需要改进。由于其与冠心病(CAD)的已知遗传关联以及USF1染色质免疫沉淀微阵列(ChIP-chip)结果的最新可用性,我们选择了转录因子上游刺激因子l(USF1)来评估我们新型TFBS预测方法的性能。 。我们研究的特定目标是开发一种新颖且准确的基因组规模方法,以预测USF1结合位点和相关的靶基因,以帮助进行CAD研究。先前发布的有关1%基因组的USF1 ChIP芯片数据用于开发和评估几种核逻辑回归预测模型。这些模型使用了基因组特征(系统发育保守性,调节潜力,CpG岛的存在和DNaseI超敏性)以及位置权重矩阵(PWM)得分的组合作为变量。在交叉验证实验中,我们最准确的预测器在接收器操作员特征曲线下的面积达到0.827,大大优于基于PWM的标准预测方法。当应用于整个人类基因组时,我们预测了9,721个基因的转录起始位点上游5公里内的24010个USF1结合位点。这些预测包括20个基因中的16个,具有USF1调控的有力证据。最后,本着基因组融合的精神,我们将独立的实验性CAD数据与这些USF1结合位点的预测结果进行了整合,以开发出一套优先的候选基因,用于未来的CAD研究。我们已经表明,我们新颖的预测方法采用了与调控元件的存在有关的基因组特征,能够更准确,更有效地预测USF1结合位点。该方法可以扩展到人类疾病研究中确定的其他转录因子,以帮助我们进一步了解复杂疾病的生物学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号