首页> 外文会议> >Discriminative discovery of transcription factor binding sites from location data
【24h】

Discriminative discovery of transcription factor binding sites from location data

机译:从位置数据中发现转录因子结合位点

获取原文

摘要

The availability of genome-wide location analyses based on chromatin immunoprecipitation (CMP) data gives a new insight for in silico analysis of transcriptional regulations. We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.
机译:基于染色质免疫沉淀(CMP)数据的全基因组位置分析的可用性为转录调控的计算机分析提供了新的见解。我们提出了一种新颖的判别发现框架,用于基于全基因组范围的位置数据,从正样本和负样本(通过转录因子(TF)同时结合和未结合的基因的上游序列的集合)中准确识别转录调控基序。在此框架中,我们的目标是找到可以最好地解释位置数据的判别性图案,因为这些图案可以准确地区分阳性样本与阴性样本。首先,为了发现正样本和负样本之间的判别子串的初始集合,我们应用了决策树学习方法,该方法生成了文本分类树。我们从学习树的内部节点中提取了几个由相似的子字符串组成的簇。其次,我们从每个聚类构建的代表轮廓HMM的初始轮廓HMM开始,然后迭代完善轮廓HMM以提高判别准确性。我们在酵母上进行的全基因组实验结果表明,我们的方法成功地鉴定了文献中已知TF的共有序列,并且在区分所有TF中的阳性和阴性样品方面表现出显着的性能,而大多数其他基序检测方法的性能却很差关于歧视的问题。我们学习到的配置文件-HMM还可以改善ChIP数据的假阴性预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号