首页> 美国卫生研究院文献>Nucleic Acids Research >CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data
【2h】

CSI-Tree: a regression tree approach for modeling binding properties of DNA-binding molecules based on cognate site identification (CSI) data

机译:CSI-Tree:一种基于同源位点识别(CSI)数据建模DNA结合分子的结合特性的回归树方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The identification and characterization of binding sites of DNA-binding molecules, including transcription factors (TFs), is a critical problem at the interface of chemistry, biology and molecular medicine. The Cognate Site Identification (CSI) array is a high-throughput microarray platform for measuring comprehensive recognition profiles of DNA-binding molecules. This technique produces datasets that are useful not only for identifying binding sites of previously uncharacterized TFs but also for elucidating dependencies, both local and nonlocal, between the nucleotides at different positions of the recognition sites. We have developed a regression tree technique, CSI-Tree, for exploring the spectrum of binding sites of DNA-binding molecules. Our approach constructs regression trees utilizing the CSI data of unaligned sequences. The resulting model partitions the binding spectrum into homogeneous regions of position specific nucleotide effects. Each homogeneous partition is then summarized by a position weight matrix (PWM). Hence, the final outcome is a binding intensity rank-ordered collection of PWMs each of which spans a different region in the binding spectrum. Nodes of the regression tree depict the critical positionucleotide combinations. We analyze the CSI data of the eukaryotic TF Nkx-2.5 and two engineered small molecule DNA ligands and obtain unique insights into their binding properties. The CSI tree for Nkx-2.5 reveals an interaction between two positions of the binding profile and elucidates how different nucleotide combinations at these two positions lead to different binding affinities. The CSI trees for the engineered DNA ligands exhibit a common preference for the dinucleotide AA in the first two positions, which is consistent with preference for a narrow and relatively flat minor groove. We carry out a reanalysis of these data with a mixture of PWMs approach. This approach is an advancement over the simple PWM model and accommodates position dependencies based on only sequence data. Our analysis indicates that the dependencies revealed by the CSI-Tree are challenging to discover without the actual binding intensities. Moreover, such a mixture model is highly sensitive to the number and length of the sequences analyzed. In contrast, CSI-Tree provides interpretable and concise summaries of the complete recognition profiles of DNA-binding molecules by utilizing binding affinities.
机译:DNA结合分子(包括转录因子)的结合位点的鉴定和表征是化学,生物学和分子医学界的一个关键问题。同源位点识别(CSI)阵列是一种高通量的微阵列平台,用于测量DNA结合分子的全面识别特征。该技术产生的数据集不仅可用于识别先前未表征的TF的结合位点,而且可用于阐明识别位点不同位置核苷酸之间的局部和非局部依赖性。我们已经开发了一种回归树技术CSI-Tree,用于探索DNA结合分子结合位点的光谱。我们的方法利用未比对序列的CSI数据构建回归树。所得模型将结合光谱划分为位置特异性核苷酸效应的均质区域。然后通过位置权重矩阵(PWM)汇总每个同质分区。因此,最终结果是PWM的结合强度等级排序集合,每个PWM都跨越结合光谱中的不同区域。回归树的节点描述了关键位置/核苷酸组合。我们分析了真核TF Nkx-2.5和两个工程化的小分子DNA配体的CSI数据,并获得了对其结合特性的独特见解。 Nkx-2.5的CSI树揭示了结合谱的两个位置之间的相互作用,并阐明了这两个位置的不同核苷酸组合如何导致不同的结合亲和力。工程DNA配体的CSI树在前两个位置显示出对二核苷酸AA的共同偏好,这与对狭窄且相对平坦的次要凹槽的偏好一致。我们使用PWM混合方法对这些数据进行了重新分析。该方法是对简单PWM模型的改进,并且仅基于序列数据来适应位置依赖性。我们的分析表明,在没有实际绑定强度的情况下,CSI-Tree揭示的依赖关系很难发现。而且,这种混合模型对所分析序列的数量和长度高度敏感。相反,CSI-Tree通过利用结合亲和力提供了DNA结合分子完整识别图谱的可解释性和简洁性摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号