...
首页> 外文期刊>Journal of chemical information and modeling >Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families
【24h】

Identification of Highest-Affinity Binding Sites of Yeast Transcription Factor Families

机译:鉴定酵母转录因子家族的最高亲和力结合位点

获取原文
获取原文并翻译 | 示例
           

摘要

Transcription factors (TFs) play a crucial role in controlling key cellular processes and responding to the environment. Yeast is a single-cell fungal organism that is a vital biological model organism for studying transcription and translation in basic biology. The transcriptional control process of yeast cells has been extensively calculated and studied using traditional methods and high-throughput technologies. However, the identities of transcription factors that regulate major functional categories of genes remain unknown. Due to the avalanche of biological data in the post-genomic era, it is an urgent need to develop automated computational methods to enable accurate identification of efficient transcription factor binding sites from the large number of candidates. In this paper, we analyzed high-resolution DNA-binding profiles and motifs for TFs, covering all possible contiguous 8-mers. First, we divided all 8-mer motifs into 16 various categories and selected all sorts of samples from each category by setting the threshold of E-score. Then, we employed five feature representation methods. Also, we adopted a total of four feature selection methods to filter out useless features. Finally, we used Extreme Gradient Boosting (XGBoost) as our base classifier and then utilized the one-vs-rest tactics to build 16 binary classifiers to solve this multiclassification problem. In the experiment, our method achieved the best performance with an overall accuracy of 79.72% and Mathew's correlation coefficient of 0.77. We found the similarity relationship among each category from different TF families and obtained sequence motif schematic diagrams via multiple sequence alignment. The complexity of DNA recognition may act as an important role in the evolution of gene regulation. Source codes are available at https://github.com/guofei-tju/tfbs.
机译:转录因子(TFS)在控制密钥蜂窝过程和响应环境方面发挥着至关重要的作用。酵母是一种单细胞真菌生物,是一种重要的生物模型生物,用于研究基本生物学的转录和翻译。使用传统方法和高通量技术进行了广泛计算和研究了酵母细胞的转录控制过程。然而,调节主要功能类别基因的转录因子的身份仍然未知。由于后基因组时代的生物数据的雪崩,迫切需要开发自动化计算方法,以便能够从大量候选者准确识别有效的转录因子结合位点。在本文中,我们分析了高分辨率的DNA粘合曲线和用于TFS的图案,覆盖了所有可能的连续8架。首先,我们将所有8-MEL主题划分为16个各种类别,并通过设置e-score的阈值来选择来自每个类别的各种样本。然后,我们使用了五种特征表示方法。此外,我们共采用共有四种功能选择来过滤掉无用功能。最后,我们使用极端渐变升压(XGBoost)作为我们的基础分类器,然后利用一个VS-REST策略来构建16个二进制分类器来解决该多分类问题。在实验中,我们的方法实现了最佳性能,整体准确性为79.72%,数学的相关系数为0.77。我们在不同TF家族中找到了每个类别之间的相似关系,并通过多个序列对准获得了序列图案示意图。 DNA识别的复杂性可以作为基因调控的演变中的重要作用。源代码在https://github.com/guofei-tju/tfbs上获得。

著录项

  • 来源
  • 作者单位

    Tianjin Univ Coll Intelligence &

    Comp Sch Comp Sci &

    Technol Tianjin 300350 Peoples R China;

    Tianjin Univ Coll Intelligence &

    Comp Sch Comp Sci &

    Technol Tianjin 300350 Peoples R China;

    Tianjin Univ Coll Intelligence &

    Comp Sch Comp Sci &

    Technol Tianjin 300350 Peoples R China;

    Tianjin Univ Coll Intelligence &

    Comp Sch Comp Sci &

    Technol Tianjin 300350 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 化学;化学工业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号