首页> 外文会议>International Conference on Applied Artificial Intelligence >Training models employing physico-chemical properties of DNA for protein binding site detection
【24h】

Training models employing physico-chemical properties of DNA for protein binding site detection

机译:培训模型采用DNA的物理化学性质进行蛋白质结合位点检测

获取原文

摘要

Transcription Factors (TFs) are one of the most important agents acting on gene expression regulation, fundamentally determining the organized functional operation of cellular machinery. At a molecular level, this effect is achieved by the sequence specific physical binding of TF proteins to particular parts of the DNA. Transcription Factors regulate gene expression in complex ways and the detection of their binding sites is an important part of many experiments. Predicting Transcription Factor Binding Sites (TFBS) from DNA sequence data has been a challenging task in the field of bioinformatics. The abundance of available DNA sequences strongly encourages the use of machine learning for this problem. Until now most of these efforts were primarily based on the traditional nucleotide-based representation of DNA. To elaborate a more detailed description of this macromolecule, we have worked out a new Physico-Chemical Descriptor (PCD) based DNA representation and used it as input for training neural networks to predict TFBSs. We show that the PCD representation is a viable format for deep learning models, and our feature selection investigation highlights the importance of proper PCD subset choices. The distinct prediction efficiencies detected upon the usage of arbitrarily selected feature subsets indicates that the different DNA features affect the DNA binding process of TFs to various extent.
机译:转录因子(TFS)是作用于基因表达调控的最重要的药剂之一,从根本上确定细胞机制的有组织功能操作。在分子水平下,通过TF蛋白的序列特异性物理结合到DNA的特定部分来实现这种效果。转录因子以复杂的方式调节基因表达,并且它们的结合位点的检测是许多实验的重要组成部分。预测来自DNA序列数据的转录因子结合位点(TFBS)是生物信息学领域的具有挑战性的任务。可用的DNA序列丰富强烈促使机器学习对此问题的使用。到目前为止,这些努力的大部分主要基于传统的基于核苷酸的DNA表示。为了详细说明对此宏观分子的更详细描述,我们已经制定了一种基于新的物理化学描述符(PCD)的DNA表示,并将其用作培训神经网络以预测TFBS的输入。我们表明PCD表示是深度学习模型的可行格式,我们的特征选择调查突出了适当的PCD子集选择的重要性。在使用任意选择的特征子集时检测到的不同预测效率表明不同的DNA特征影响TFS的DNA结合过程各种程度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号