首页> 外文会议>IEEE Computational Systems Bioinformatics Conference >Discriminative discovery of transcription factor binding sites from location data
【24h】

Discriminative discovery of transcription factor binding sites from location data

机译:从位置数据辨别转录因子结合位点的判别发现

获取原文

摘要

The availability of genome-wide location analyses based on chromatin immunoprecipitation (CMP) data gives a new insight for in silico analysis of transcriptional regulations. We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.
机译:基于染色质免疫沉淀(CMP)数据的基因组定位分析的可用性为转录规定的硅分析提供了新的洞察。我们提出了一种新的鉴别发现框架,用于基于基于基因组的位置数据,精确地识别来自正和阴性样本的转录调节基因(通过转录因子(TF)的束缚和未结合的上游序列组)。在这一框架中,我们的目标是找到这种判别案画器,最能解释定位数据,以至于主题精确地区分从负面样本。首先,为了发现正和否定样本之间的初始识别子字符串,我们应用产生文本分类树的决策树学习方法。我们从学习树的内部节点中提取由类似的子串组成的群集。其次,我们从每个集群构造的初始轮廓-HMMS开始,以表示推定的图案,并迭代地细化轮廓-HMM以提高辨别精度。我们对酵母的基因组实验结果表明,我们的方法成功地鉴定了文献中已知的TFS的共识序列,并进一步提出了显着的性能,以区分所有TFS的正面和阴性样本,而大多数其他基序方法表现出非常差的表现论鉴别问题。我们学识渊博的配置文件-HMMS还改善了芯片数据的假否定预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号