...
首页> 外文期刊>Genomics, proteomics & bioinformatics >A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions *
【24h】

A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions *

机译:基于双向网络的长期非编码RNA-蛋白质相互作用预测方法*

获取原文
           

摘要

As one large class of non-coding RNAs (ncRNAs), long ncRNAs ( lncRNAs ) have gained considerable attention in recent years. Mutations and dysfunction of lncRNAs have been implicated in human disorders. Many lncRNAs exert their effects through interactions with the corresponding RNA-binding proteins . Several computational approaches have been developed, but only few are able to perform the prediction of these interactions from a network-based point of view. Here, we introduce a computational method named lncRNA–protein bipartite network inference (LPBNI). LPBNI aims to identify potential lncRNA–interacting proteins , by making full use of the known lncRNA–protein interactions . Leave-one-out cross validation (LOOCV) test shows that LPBNI significantly outperforms other network-based methods, including random walk (RWR) and protein-based collaborative filtering (ProCF). Furthermore, a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lncRNA–interacting proteins. Keywords lncRNA ; Protein ; Interaction ; Bipartite network ; Propagation prs.rt("abs_end"); Introduction An increasing number of studies show that approximately 2% of the whole mammalian genome represents protein-coding genes, whereas the majority of the genome consists of non-coding RNA (ncRNA) genes. ncRNAs had long been regarded as transcriptional noise, but recent investigations demonstrate that ncRNAs play an important role in the regulation of diverse biological processes [1] , [2] , [3] , [4] and [5] . Long ncRNAs (lncRNAs), which consist of more than 200 nucleotides, constitute a large class of ncRNAs [6] and [7] . In the past several years, the number of identified lncRNAs has been increasing sharply because of the development of both bioinformatics tools and experimental techniques. Functional studies of lncRNAs show that mutated and dysfunctional lncRNAs are implicated in a range of cellular processes [8] , [9] , [10] , [11] and [12] and human diseases, ranging from neurodegeneration to cancer [13] , [14] , [15] , [16] , [17] and [18] . Although some lncRNAs, e.g. , Xist [19] and MALAT1 [20] , have been well studied, the functions of most lncRNAs remain unclear. Usually lncRNAs function through interacting with RNA-binding proteins (RBPs) [21] , [22] , [23] and [24] . Therefore, it is important to predict the potential lncRNA–protein interactions, in order to study the complex function of lncRNAs. Since the experimental identification of lncRNA–protein interactions remains costly, developing effective predictive approaches becomes essential. Recently, several computational methods have been reported for predicting potential lncRNA–protein interactions. For instance, Bellucci et al. developed catRAPID in 2011 [25] by taking into account secondary structure, hydrogen bonds, and van der Waals forces between lncRNAs and proteins. Next, Muppirala et al. [26] introduced a method named RPISeq, using only sequence information of lncRNAs and proteins. Support vector machine (SVM) classifiers [27] and random forest (RF) [28] are used to predict RBPs. In 2013, Lu et al. [29] developed a novel approach, named lncPro, which uses secondary structure, hydrogen bond, van der Waals force features, and yields the prediction score using Fisher’s linear discriminate method. Later on, an approach named RPI-Pred was developed by Suresh et al. [30] , they trained SVM-based approach, by extracting sequence and high-order 3D structure features of lncRNAs and proteins. All the aforementioned methods are based on the biological characteristics of ncRNAs and proteins. CatRAPID and lncPro combined sequence and structural features of lncRNAs and proteins. RPISeq was based on sequence features. RPI-Pred used the high-order structure features of lncRNAs and proteins. However some studies show that lncRNAs generally exhibit low sequence conservation [1] , which may make it difficult to predict interactions based on the intrinsic properties of lncRNAs. Biological network-based methods are widely used in many types of studies, such as disease gene prioritization [31] and drug-target interaction prediction [32] . The development of bioinformatics technologies such as CLIP-seq and cross-linking immunoprecipitation, has enabled us to construct lncRNA–protein interaction networks. We introduce here a novel computational method, lncRNA–protein bipartite network inference (LPBNI), for the prediction of lncRNA–protein interactions. LPBNI identifies novel lncRNA–protein pairs by efficiently using the lncRNA–protein bipartite network. In order to evaluate the performance of the proposed method, we compared LPBNI with other network-based methods, including random walk (RWR) [31] and protein-based collaborative filtering (ProCF) [33] . RWR [31] has been used to predict genes associated with potential diseases. ProCF is derived from the recommendation algorithms, similar to the item-based collaborative filtering method [33] . The
机译:作为一类大类的非编码RNA(ncRNA),长ncRNA(lncRNA)近年来受到了相当大的关注。 lncRNA的突变和功能障碍与人类疾病有关。许多lncRNA通过与相应的RNA结合蛋白相互作用而发挥作用。已经开发了几种计算方法,但是从基于网络的角度来看,只有少数几种方法能够执行这些交互的预测。在这里,我们介绍一种称为lncRNA-蛋白质二分网络推断(LPBNI)的计算方法。 LPBNI的目的是通过充分利用已知的lncRNA与蛋白质的相互作用来鉴定潜在的lncRNA与蛋白质的相互作用。留一法交叉验证(LOOCV)测试表明LPBNI明显优于其他基于网络的方法,包括随机游走(RWR)和基于蛋白质的协同过滤(ProCF)。此外,进行了一个案例研究,以使用实际数据证明LPBNI在预测潜在的lncRNA相互作用蛋白中的性能。关键词lncRNA;蛋白质;互动;双向网络;传播prs.rt(“ abs_end”);引言越来越多的研究表明,整个哺乳动物基因组中大约2%代表蛋白质编码基因,而基因组的大部分由非编码RNA(ncRNA)基因组成。 ncRNA长期以来一直被认为是转录噪声,但是最近的研究表明ncRNA在多种生物过程的调控中起着重要作用[1],[2],[3],[4]和[5]。长ncRNA(lncRNA)由200多个核苷酸组成,构成一类很大的ncRNA [6]和[7]。在过去的几年中,由于生物信息学工具和实验技术的发展,鉴定出的lncRNA的数量急剧增加。 lncRNA的功能研究表明,突变和功能异常的lncRNA涉及一系列细胞过程[8],[9],[10],[11]和[12]以及人类疾病,从神经变性到癌症[13], [14],[15],[16],[17]和[18]。尽管一些lncRNAs,例如,Xist [19]和MALAT1 [20]已被充分研究,大多数lncRNA的功能仍不清楚。通常,lncRNA通过与RNA结合蛋白(RBP)相互作用来发挥功能[21],[22],[23]和[24]。因此,重要的是预测潜在的lncRNA-蛋白质相互作用,以研究lncRNA的复杂功能。由于lncRNA与蛋白质相互作用的实验鉴定仍然很昂贵,因此开发有效的预测方法变得至关重要。最近,已报道了几种预测潜在的lncRNA-蛋白质相互作用的计算方法。例如,Bellucci等。考虑到二级结构,氢键以及lncRNA与蛋白质之间的范德华力,于2011年开发了catRAPID [25]。接下来,Muppirala等。 [26]引入了一种名为RPISeq的方法,仅使用lncRNA和蛋白质的序列信息。支持向量机(SVM)分类器[27]和随机森林(RF)[28]用于预测RBP。 2013年,Lu等。 [29]开发了一种名为lncPro的新方法,该方法使用了二级结构,氢键,范德华力特征,并使用Fisher的线性判别方法得出了预测得分。后来,Suresh等人开发了一种名为RPI-Pred的方法。 [30],他们通过提取lncRNA和蛋白质的序列和高阶3D结构特征,训练了基于SVM的方法。所有上述方法均基于ncRNA和蛋白质的生物学特性。 CatRAPID和lncPro结合了lncRNA和蛋白质的序列和结构特征。 RPISeq基于序列特征。 RPI-Pred使用了lncRNA和蛋白质的高阶结构特征。然而,一些研究表明,lncRNA通常表现出较低的序列保守性[1],这可能使得难以根据lncRNA的内在特性预测相互作用。基于生物网络的方法已广泛用于许多类型的研究中,例如疾病基因优先级确定[31]和药物-靶标相互作用预测[32]。诸如CLIP-seq和交联免疫沉淀等生物信息技术的发展使我们能够构建lncRNA-蛋白质相互作用网络。我们在这里介绍一种新颖的计算方法,即lncRNA-蛋白质二分网络推断(LPBNI),用于预测lncRNA-蛋白质相互作用。 LPBNI通过有效地使用lncRNA-蛋白质二分网络识别新的lncRNA-蛋白质对。为了评估该方法的性能,我们将LPBNI与其他基于网络的方法进行了比较,包括随机游走(RWR)[31]和基于蛋白质的协同过滤(ProCF)[33]。 RWR [31]已被用于预测与潜在疾病相关的基因。 ProCF源自推荐算法,类似于基于项目的协作过滤方法[33]。的

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号