首页> 美国卫生研究院文献>Nucleic Acids Research >Discovering protein–DNA binding sequence patterns using association rule mining
【2h】

Discovering protein–DNA binding sequence patterns using association rule mining

机译:使用关联规则挖掘发现蛋白质-DNA结合序列模式

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings.
机译:转录因子(TFs)和转录因子结合位点(TFBSs)之间的蛋白质-DNA结合在转录调控中起着至关重要的作用。在过去的几十年中,人们为研究蛋白质与DNA的结合原理做出了巨大的努力。但是,认为在氨基酸和核苷酸之间没有简单的一对一规则。除了序列模式之外,许多方法还施加了复杂的功能。蛋白质-DNA结合是由相关的氨基酸和核苷酸序列对形成的,它们决定了许多功能特征。因此,期望研究TF和TFBS之间的相关序列模式。随着计算能力的提高,有关DNA和蛋白质的大量实验数据库的可用性以及成熟的数据挖掘技术,我们提出了一个框架,以从TRANSFAC中以最明确和可解释的形式发现相关的TF-TFBS结合序列模式。该框架基于使用Apriori算法的关联规则挖掘。通过在TRANSFAC上几个级别进行定量测量,评估发现的模式。通过文献,蛋白质数据库和同源性模型的进一步独立验证,有力的证据表明,所发现的模式揭示了跨不同TF和TFBS的真实TF-TFBS结合,这可以促使人们进一步了解以更好地理解TF-TFBS结合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号