Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions

机译：学习RNA和蛋白质序列的分布式表示及其在预测lncRNA-蛋白质相互作用中的应用

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

The long noncoding RNAs (lncRNAs) are ubiquitous in organisms and play crucial role in a variety of biological processes and complex diseases. Emerging evidences suggest that lncRNAs interact with corresponding proteins to perform their regulatory functions. Therefore, identifying interacting lncRNA-protein pairs is the first step in understanding the function and mechanism of lncRNA. Since it is time-consuming and expensive to determine lncRNA-protein interactions by high-throughput experiments, more robust and accurate computational methods need to be developed. In this study, we developed a new sequence distributed representation learning based method for potential lncRNA-Protein Interactions Prediction, named LPI-Pred, which is inspired by the similarity between natural language and biological sequences. More specifically, lncRNA and protein sequences were divided into -mer segmentation, which can be regard as “word” in natural language processing. Then, we trained out the RNA2vec and Pro2vec model using word2vec and genome-wide lncRNA and protein sequences to mine distribution representation of RNA and protein. Then, the dimension of complex features is reduced by using feature selection based on Gini information impurity measure. Finally, these discriminative features are used to train a Random Forest classifier to predict lncRNA-protein interactions. Five-fold cross-validation was adopted to evaluate the performance of LPI-Pred on three benchmark datasets, including RPI369, RPI488 and RPI2241. The results demonstrate that LPI-Pred can be a useful tool to provide reliable guidance for biological research.

机译：长的非编码RNA（lncRNA）在生物体中无处不在，并在各种生物过程和复杂疾病中发挥关键作用。新兴证据表明，lncRNA与相应的蛋白质相互作用以执行其调节功能。因此，鉴定相互作用的lncRNA-蛋白质对是理解lncRNA的功能和机制的第一步。由于通过高通量实验确定lncRNA-蛋白质相互作用既耗时又昂贵，因此需要开发更强大和准确的计算方法。在这项研究中，我们开发了一种新的基于序列分布式表示学习的潜在lncRNA-蛋白质相互作用预测的方法，称为LPI-Pred，其灵感来自自然语言和生物学序列之间的相似性。更具体地说，lncRNA和蛋白质序列分为-mer片段，在自然语言处理中可以将其视为“单词”。然后，我们使用word2vec和全基因组lncRNA和蛋白质序列训练了RNA2vec和Pro2vec模型，以挖掘RNA和蛋白质的分布表示形式。然后，通过使用基于基尼信息杂质测度的特征选择来减小复杂特征的维数。最后，这些判别特征用于训练随机森林分类器以预测lncRNA-蛋白质相互作用。采用五重交叉验证在三个基准数据集（包括RPI369，RPI488和RPI2241）上评估LPI-Pred的性能。结果表明，LPI-Pred可以作为有用的工具，为生物学研究提供可靠的指导。

著录项

期刊名称 Computational and Structural Biotechnology Journal
作者
Hai-Cheng Yi; Zhu-Hong You; Li Cheng; Xi Zhou; Tong-Hai Jiang; Xiao Li; Yan-Bin Wang;
展开▼
作者单位

展开▼
年(卷),期 2020(18),-1
年度 2020
页码 -1
总页数 7
原文格式 PDF
正文语种
中图分类生物学;
关键词
Distribution representation; Natural language processing; Word2vec; RNA-protein interaction;

机译：分布表示;自然语言处理;Word2vec;RNA-蛋白质相互作用;

相似文献

外文文献
中文文献
专利

1. Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions [J] . Hai-Cheng Yi, Zhu-Hong You, Li Cheng, Computational and Structural Biotechnology Journal . 2020,第1期

机译：学习RNA和蛋白质序列的分布式表示及其预测LNCRNA - 蛋白质相互作用的应用
2. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions [J] . WenZhang, XiangYue, GuifengTang, PLoS Computational Biology . 2018,第12期

机译：SFPEL-LPI：基于序列的特征投影集成学习，用于预测LncRNA-蛋白质相互作用
3. Learning distributed representations of RNA sequences and its application for predicting RNA-protein binding sites with a convolutional neural network [J] . Pan Xiaoyong, Shen Hong-Bin Neurocomputing . 2018,第AUGa30期

机译：通过卷积神经网络学习RNA序列的分布式表示形式及其在预测RNA-蛋白质结合位点中的应用
4. Predicting lncRNA-Protein Interactions Based on Protein-Protein Similarity Network Fusion (Extended Abstract) [C] . Xiaoxiang Zheng, Kai Tian, Yang Wang, International symposium on bioinformatics research and applications . 2016

机译：基于蛋白质-蛋白质相似性网络融合的lncRNA-蛋白质相互作用预测（扩展摘要）
5. Identification of interface residues involved in protein-protein and protein-DNA interactions from sequence using machine learning approaches. [D] . Yan, Changhui. 2005

机译：使用机器学习方法从序列中识别参与蛋白质-蛋白质和蛋白质-DNA相互作用的界面残基。
6. SFPEL-LPI: Sequence-based feature projection ensemble learning for predicting LncRNA-protein interactions [O] . Wen Zhang, Xiang Yue, Guifeng Tang, 2018

机译：SFPEL-LPI：基于序列的特征投影集成学习用于预测LncRNA-蛋白质相互作用
7. LPI-KTASLP: Prediction of LncRNA-Protein Interaction by Semi-Supervised Link Learning With Multivariate Information [O] . Cong Shen, Yijie Ding, Jijun Tang, 2019

机译：LPI-KTASLP：通过多元信息的半监督链接学习预测LNCRNA-蛋白互动

Learning distributed representations of RNA and protein sequences and its application for predicting lncRNA-protein interactions

摘要

著录项

相似文献

相关主题

期刊订阅