首页> 美国卫生研究院文献>International Journal of Molecular Sciences >LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
【2h】

LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion

机译:LNClocation:长期非编码RNA的多源异构特征融合的高效亚细胞位置预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.
机译:最近的研究揭示了长期非编码RNA(LNCRNA)的亚细胞位置可以提供有关其功能的重要信息。由于缺乏实验数据,LNCRNA的数量非常有限,实验验证的亚细胞定位,位于不同细胞器中的LNCRNA的数量越来越不平衡。 LNCRNA的亚细胞位置的预测实际上是一种多分类的小样本不平衡问题。数据的不平衡导致机器学习模型对小型数据子集的识别效果不佳,这是现有研究中的令人困惑和具有挑战性的问题。在本研究中,我们集成了多源特征来构建基于序列的计算工具,LNClocation,以预测LNCrNA的亚细胞位置。 AutoEncoder用于增强部分功能,并且使用基于二项式分布的过滤方法和递归功能消除(RFE)来过滤一些功能。它提高了数据的表示能力,并减少了不平衡多分类数据的问题。通过对不同特征组合和机器学习模型的全面实验,我们选择最佳特征和分类器模型方案来构建亚细胞位置预测工具,LNClocation。 LNClocation可以在基准数据上获得87.78%的准确性,比基准数据高于最先进的工具,以及尤其是小类组的分类性能显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号