...
首页> 外文期刊>Molecular genetics and genomics: MGG >iN6-methylat (5-step): identifying DNA N-6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule
【24h】

iN6-methylat (5-step): identifying DNA N-6-methyladenine sites in rice genome using continuous bag of nucleobases via Chou's 5-step rule

机译:In6-Methylat(5步):使用Chou的5步规则使用连续的核酶鉴定水稻基因组中的DNA N-6-甲基腺嘌呤位点

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

DNA N-6-methyladenine is a non-canonical DNA modification that occurs in different eukaryotes at low levels and it has been identified as an extremely important function of life. Moreover, about 0.2% of adenines are marked by DNA N-6-methyladenine in the rice genome, higher than in most of the other species. Therefore, the identification of them has become a very important area of study, especially in biological research. Despite the few computational tools employed to address this problem, there still requires a lot of efforts to improve their performance results. In this study, we treat DNA sequences by the continuous bags of nucleobases, including sub-word information of its biological words, which then serve as features to be fed into a support vector machine algorithm to identify them. Our model which uses this hybrid approach could identify DNA N-6-methyladenine sites with achieved a jackknife test sensitivity of 86.48%, specificity of 89.09%, accuracy of 87.78%, and MCC of 0.756. Compared to the state-of-the-art predictor as well as the other methods, our proposed model is able to yield superior performance in all the metrics. Moreover, this study provides a basis for further research that can enrich a field of applying natural language-processing techniques in biological sequences.
机译:DNA N-6-甲基腺嘌呤是一种非规范的DNA修饰,其在低水平下发生在不同的真核生物中,并且已被鉴定为生命的极其重要的功能。此外,大约0.2%的腺嘌呤由水稻基因组中的DNA N-6-甲基腺嘌呤标记,高于大多数其他物种。因此,鉴定它们已成为一个非常重要的研究领域,特别是在生物学研究中。尽管采用少数用于解决此问题的计算工具,但仍需要提高其性能结果的许多努力。在该研究中,我们通过连续的核碱基袋治疗DNA序列,包括其生物单词的子字信息,然后将其作为要馈送到支持向量机算法的特征以识别它们。我们使用这种混合方法的模型可以鉴定DNA N-6-甲基腺嘌呤位点,达到86.48%,特异性为89.09%,精度为87.78%,MCC为0.756的胶卷试验敏感性。与最先进的预测器以及其他方法相比,我们所提出的模型能够在所有度量标准中产生卓越的性能。此外,该研究为进一步研究提供了可以丰富在生物序列中应用自然语言处理技术的领域的进一步研究的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号