Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data

首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data

【24h】

Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data

机译：深度卷积神经网络，用于预测与DNA序列数据的白血病相关转录因子结合位点

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Transcription factors are proteins that could bind to specific DNA sequences so as to regulate gene expressions. Currently, identification of transcription factor binding sites locating in DNA sequences is very important for building regulatory model in biological systems and identifying pathogenic variations. Traditional machine-learning methods have been successfully used for biological prediction problems based on DNA or protein sequences, but they all need to manually extract numerical features, which is not only tedious, but also would ignore effective information of first-order sequences. In this paper, based on the principle of deep learning (DL), we constructed prediction model for transcription factor binding sites only from DNA original base sequences. Here, a DL method based on convolutional neural network (CNN) and long short-term memory (LSTM) were proposed to investigate four leukemia categories from the perspective of transcription factor binding sites using four large non-redundant datasets for acute, chronic, myeloid and lymphatic leukemia, respectively. Compared with three widely used machine-learning methods of artificial neural network (ANN), support vector machine (SVM) and random forest (RF), our DL method exhibits significant superiority in terms of prediction performance, since the prediction accuracy of three machine-learning models either based on sequence feature or k-mer feature extraction are all lower than that of DL model. The available DL models for four leukemia categories gives an average prediction accuracy of 75% based only on sequence segments with 101 bases, which indicates that the DL based method is promising with unique advantages over the traditional machine learning methods. But focusing on leukemia-related transcription factor binding site prediction, further improvements would be implemented such as optimizing base segment length and CNN architecture, in order to improve the current prediction accuracy.

机译：转录因子是可以与特定DNA序列结合的蛋白质，以调节基因表达。目前，在DNA序列中定位的转录因子结合位点对于在生物系统中建立调节模型并鉴定致病性变化非常重要。传统的机器学习方法已成功用于基于DNA或蛋白质序列的生物预测问题，但它们都需要手动提取数值特征，这不仅乏味，而且还会忽略一阶序列的有效信息。本文基于深度学习（DL）的原理，我们仅从DNA原始碱基序列构建了转录因子结合位点的预测模型。这里，提出了一种基于卷积神经网络（CNN）和长短期记忆（LSTM）的DL方法，从使用四个大型非冗余数据集进行急性，慢性，髓样，从转录因子结合位点的角度调查四种白血病类别和淋巴的白血病分别。与三种广泛使用的人工神经网络（ANN）的机器学习方法相比，支持向量机（SVM）和随机森林（RF），我们的DL方法在预测性能方面表现出显着的优越感，因为三台机器的预测精度 - 基于序列特征或k-mer特征提取的学习模型均低于DL模型的模型。仅基于101个基础的序列段，4个白血病类别的可用DL模型为75％的平均预测精度为75％，这表明基于DL的方法具有与传统机器学习方法相比独特的优势。但专注于白血病相关的转录因子绑定站点预测，进一步改进，例如优化基本段长度和CNN架构，以提高电流预测精度。

著录项

来源
《Chemometrics and Intelligent Laboratory Systems》 |2020年第2020期|共6页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计量学;
关键词
Transcription factor binding site; Deep learning; Machine-learning; DNA sequence; Leukemia;

机译：转录因子结合位点;深入学习;机器学习;DNA序列;白血病;

相似文献

外文文献
中文文献
专利

1. Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data [J] . Chemometrics and Intelligent Laboratory Systems . 2020,第期

机译：深度卷积神经网络，用于预测与DNA序列数据的白血病相关转录因子结合位点
2. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants [J] . Meng Wang, Cheng Tai, Weinan E, Nucleic acids research . 2018,第11期

机译：DeFine：深度卷积神经网络可准确量化转录因子与DNA的结合强度，并有助于评估功能性非编码变体
3. Enabling full‐length evolutionary profiles based deep convolutional neural network for predicting DNA‐binding proteins from sequence [J] . Chauhan Sucheta, Ahmad Shandar Proteins: Structure, Function, and Genetics . 2020,第1期

机译：从序列中启用基于深卷积神经网络的全长进化型材基于深度卷积神经网络，从而预测DNA结合蛋白
4. Predicting Transcription Factor Binding Sites in DNA Sequences Without Prior Knowledge [C] . Wook Lee, Byungkyu Park, Daesik Choi, International conference on advanced intelligent computing theories and applications . 2016

机译：在没有先验知识的情况下预测DNA序列中的转录因子结合位点
5. Sequence Specificity and Transcriptional Output of the C-clamp, an Auxiliary DNA Binding Domain in LEF/TCF Transcription Factors. [D] . Hoverter, Nate Pasquale. 2013

机译：C-钳位的序列特异性和转录输出，C-钳位是LEF / TCF转录因子中的辅助DNA结合域。
6. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants [O] . Meng Wang, Cheng Tai, Weinan E, 2018

机译：DeFine：深度卷积神经网络可准确定量转录因子与DNA的结合强度并有助于评估功能性非编码变体
7. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants [O] . Meng Wang, Cheng Tai, Weinan E, 2018

机译：定义：深卷积神经网络精确地量化转录因子-DNA结合的强度，促进功能性非编码变体的评估

Deep convolutional neural networks for predicting leukemia-related transcription factor binding sites from DNA sequence data

摘要

著录项

相似文献

相关主题

期刊订阅