首页> 美国卫生研究院文献>International Journal of Molecular Sciences >A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification
【2h】

A Computational Framework Based on Ensemble Deep Neural Networks for Essential Genes Identification

机译:基于集合深神经网络的基本基因识别的计算框架

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Essential genes contain key information of genomes that could be the key to a comprehensive understanding of life and evolution. Because of their importance, studies of essential genes have been considered a crucial problem in computational biology. Computational methods for identifying essential genes have become increasingly popular to reduce the cost and time-consumption of traditional experiments. A few models have addressed this problem, but performance is still not satisfactory because of high dimensional features and the use of traditional machine learning algorithms. Thus, there is a need to create a novel model to improve the predictive performance of this problem from DNA sequence features. This study took advantage of a natural language processing (NLP) model in learning biological sequences by treating them as natural language words. To learn the NLP features, a supervised learning model was consequentially employed by an ensemble deep neural network. Our proposed method could identify essential genes with sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), and area under the receiver operating characteristic curve (AUC) values of 60.2%, 84.6%, 76.3%, 0.449, and 0.814, respectively. The overall performance outperformed the single models without ensemble, as well as the state-of-the-art predictors on the same benchmark dataset. This indicated the effectiveness of the proposed method in determining essential genes, in particular, and other sequencing problems, in general.
机译:基本基因包含基因组的关键信息,可以成为全面了解生命和进化的关键。由于他们的重要性,基本基因的研究被认为是计算生物学的关键问题。用于识别基因的计算方法已经变得越来越受欢迎,以降低传统实验的成本和时间消耗。一些型号已经解决了这个问题,但由于高维特征和传统机器学习算法的使用仍然令人满意。因此,需要创建一种新型模型以改善来自DNA序列特征的该问题的预测性能。本研究利用了通过作为自然语言词来学习生物序列中的自然语言处理(NLP)模型。为了学习NLP特征,由集合深神经网络后效地采用了监督学习模型。我们所提出的方法可以识别具有灵敏度,特异性,准确性,马修的相关系数(MCC)的基因,以及接收器操作特征曲线(AUC)值的面积分别为60.2%,84.6%,76.3%,0.449和0.814。整体性能在没有合并的情况下表现出单一的模型,以及相同的基准数据集上的最先进的预测器。这表明了拟议方法在确定基因,特别是和其他测序问题方面的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号