首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes
【24h】

Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes

机译:鲁棒的归纳矩阵完成策略,探索LincRNA与人类疾病表型之间的关联。

获取原文
获取原文并翻译 | 示例

摘要

Over the past few years, it has been established that a number of long intergenic non-coding RNAs (lincRNAs) are linked to a wide variety of human diseases. The relationship among many other lincRNAs still remains as puzzle. Validation of such link between the two entities through biological experiments is expensive. However, piles of information about the two are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc., thereby opening opportunity for cutting-edge machine learning and data mining approaches. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of these utilizes side information of both the entities. The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve these two issues. As a remedy, in this paper, we propose Robust Inductive Matrix Completion (RIMC) using l(2,1) norm loss function as well as l(2,1) norm based regularization. We applied RIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. Our method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top-k disease prioritization to the subject lincRNAs. We also demonstrate that RIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. Availability: All the supporting datasets are available at the publicly accessible URL located at http://biomecis.uta.edu/similar to ashis/res/RIMC/.
机译:在过去的几年中,已经确定许多长的基因间非编码RNA(lincRNA)与多种人类疾病有关。许多其他lincRNA之间的关系仍然令人困惑。通过生物学实验验证两个实体之间的这种联系是昂贵的。但是,由于高吞吐量测序(HTS)平台,基因组广泛关联研究(GWAS)等,有关这两者的信息正变得越来越多,从而为尖端的机器学习和数据挖掘方法打开了机会。但是,迄今为止,只有很少的计算机模拟lincRNA-疾病关联推断工具可用,并且这些工具都没有利用两个实体的辅助信息。最近开发的归纳矩阵完成(IMC)技术在两个实体之间提供了一个推荐平台,考虑了各自的辅助信息。但是,IMC的公式无法处理数据集中可能存在的噪声和异常值,而数据稀疏性是标准IMC方法的另一个问题。因此,需要能够解决这两个问题的强大版本的IMC。作为一种补救措施,在本文中,我们提出了使用l(2,1)范数损失函数以及基于l(2,1)范数的正则化的鲁棒归纳矩阵完成(RIMC)。我们将RIMC应用于人类lincRNA和OMIM疾病表型之间的可用关联数据,以及有关lincRNA和疾病的多种辅助信息。我们的方法在对主题lincRNA的前k个疾病的优先排序方面,在precision @ k和召回率k方面比最新方法表现更好。我们还证明,RIMC在查询新型lincRNAs方面同样有效,并且对于一组特征明确的lincRNAs预测新近已知疾病的等级也同样有效。可用性:所有支持数据集都可从位于http://biomecis.uta.edu/与ashis / res / RIMC /类似的可公开访问的URL获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号