首页> 外文会议>International Conference on Intelligent Computing >Three-Layer Dynamic Transfer Learning Language Model for E. Coli Promoter Classification
【24h】

Three-Layer Dynamic Transfer Learning Language Model for E. Coli Promoter Classification

机译:大肠杆菌启动子分类的三层动态转移学习语言模型

获取原文

摘要

Classification of functional genomic regions (such as promoters or enhancers) based on sequence data alone is a very important problem. Various data mining algorithms can be used well to apply to predict the promoter region. For example, association and clustering algorithms like Classification And Regression Tree (CART), machine learning algorithms like Simple Logistic, BayesNet, Random forest, or the most popular deep learning like Recurrent Neural Network (RNN), Convolutional Neural Networks (CNN). However, due to large amount of genetic data are unlabeled, these methods cannot directly solve this challenge. Therefore, we present a three-layer dynamic transfer learning language model (TLDTLL) for E. coli promoter classification problems. TLDTLL is an effective algorithm for inductive transfer learning that utilizes pre-training on large unlabeled genomic corpuses. This is particularly advantageous in the context of genomics data, which tends to contain significant volumes of unlabeled data. TLDTLL shows improved results over existing methods for classification of E. coli promoters using only sequence data.
机译:仅基于单独的序列数据的功能基因组区域(例如启动子或增强剂)的分类是一个非常重要的问题。可以很好地使用各种数据挖掘算法以适用于预测启动子区域。例如,关联和聚类算法如分类和回归树(推车),机器学习算法,如简单的逻辑,贝斯网,随机森林,或最受欢迎的深度学习,如经常性神经网络(RNN),卷积神经网络(CNN)。然而,由于大量的遗传数据未标记,这些方法不能直接解决这一挑战。因此,我们为大肠杆菌推动者分类问题提供了一种三层动态转移学习语言模型(TLDTLL)。 TLDTLL是一种有效的归纳转移学习算法,其利用大型未标记的基因组核心的预训练。这在基因组学数据的背景下是特别有利的,这往往包含显着的未标记数据。 TLDTLL在仅使用序列数据的情况下,改善了对大肠杆菌启动子分类的现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号