...
首页> 外文期刊>BMC Genomics >A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data
【24h】

A scaling-free minimum enclosing ball method to detect differentially expressed genes for RNA-seq data

机译:无缩放的最小封闭球方法,用于检测RNA-SEQ数据的差异表达基因

获取原文

摘要

Identifying differentially expressed genes between the same or different species is an urgent demand for biological and medical research. For RNA-seq data, systematic technical effects and different sequencing depths are usually encountered when conducting experiments. Normalization is regarded as an essential step in the discovery of biologically important changes in expression. The present methods usually involve normalization of the data with a scaling factor, followed by detection of significant genes. However, more than one scaling factor may exist because of the complexity of real data. Consequently, methods that normalize data by a single scaling factor may deliver suboptimal performance or may not even work.The development of modern machine learning techniques has provided a new perspective regarding discrimination between differentially expressed (DE) and non-DE genes. However, in reality, the non-DE genes comprise only a small set and may contain housekeeping genes (in same species) or conserved orthologous genes (in different species). Therefore, the process of detecting DE genes can be formulated as a one-class classification problem, where only non-DE genes are observed, while DE genes are completely absent from the training data. In this study, we transform the problem to an outlier detection problem by treating DE genes as outliers, and we propose a scaling-free minimum enclosing ball (SFMEB) method to construct a smallest possible ball to contain the known non-DE genes in a feature space. The genes outside the minimum enclosing ball can then be naturally considered to be DE genes. Compared with the existing methods, the proposed SFMEB method does not require data normalization, which is particularly attractive when the RNA-seq data include more than one scaling factor. Furthermore, the SFMEB method could be easily extended to different species without normalization. Simulation studies demonstrate that the SFMEB method works well in a wide range of settings, especially when the data are heterogeneous or biological replicates. Analysis of the real data also supports the conclusion that the SFMEB method outperforms other existing competitors. The R package of the proposed method is available at https://bioconductor.org/packages/MEB .
机译:鉴定相同或不同物种之间的差异表达基因是对生物和医学研究的迫切需求。对于RNA-SEQ数据,在进行实验时通常遇到系统的技术效果和不同的测序深度。正常化被认为是发现表达生物学重要变化的重要步骤。本方法通常涉及具有缩放因子的数据的标准化,然后检测显着基因。然而,由于实际数据的复杂性,可能存在多于一个缩放因子。因此,通过单个缩放因子正常化数据的方法可以提供次优性能或甚至可能甚至不起作用。现代机器学习技术的发展已经提供了关于差异表达(DE)和非DE基因之间的判别的新视角。然而,实际上,非DE基因仅包含一个小集合,并且可以包含内脏基因(相同物种)或保守的正交基因(在不同的物种)。因此,检测de基因的方法可以作为单级分类问题配制,其中仅观察到非DE基因,而DE基因完全没有训练数据。在这项研究中,我们通过将DE基因视为异常值来将问题转变为异常检测问题,我们提出了一种无缩放的最小封闭球(SFMEB)方法来构建最小可能的球,以含有A中的已知非DE基因特征空间。然后,在最小封闭球之外的基因可以自然被认为是de基因。与现有方法相比,所提出的SFMEB方法不需要数据归一化,当RNA-SEQ数据包括多于一个缩放因子时是特别有吸引力的。此外,SFMEB方法可以很容易地扩展到不同物种而不进行归一化。仿真研究表明,SFMEB方法在广泛的环境中运用良好,特别是当数据是异构或生物复制时。实际数据的分析还支持SFMEB方法优于其他现有竞争对手的结论。所提出的方法的R包在https://biocumon.org/packages/meb中提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号