首页> 外文期刊>Royal Society Open Science >Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality
【24h】

Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality

机译:基因组数据中的异常频率检测的鲁棒子空间方法避免了维度的诅咒

获取原文
获取外文期刊封面目录资料

摘要

The application of machine learning to inference problems in biology is dominated by supervised learning problems of regression and classification, and unsupervised learning problems of clustering and variants of low-dimensional projections for visualization. A class of problems that have not gained much attention is detecting outliers in datasets, arising from reasons such as gross experimental, reporting or labelling errors. These could also be small parts of a dataset that are functionally distinct from the majority of a population. Outlier data are often identified by considering the probability density of normal data and comparing data likelihoods against some threshold. This classical approach suffers from the curse of dimensionality, which is a serious problem with omics data which are often found in very high dimensions. We develop an outlier detection method based on structured low-rank approximation methods. The objective function includes a regularizer based on neighbourhood information captured in the graph Laplacian. Results on publicly available genomic data show that our method robustly detects outliers whereas a density-based method fails even at moderate dimensions. Moreover, we show that our method has better clustering and visualization performance on the recovered low-dimensional projection when compared with popular dimensionality reduction techniques.
机译:机器学习在生物学中的推理问题的应用是由回归和分类的监督学习问题主导,以及用于可视化的低维投影的聚类和变种的无监督学习问题。一类没有获得大量关注的问题是检测数据集中的异常值,这是由于毛重实验,报告或标记错误等原因而产生的。这些也可以是数据集的小部分,这些数据集与大多数人的大多数都不同。通常通过考虑正常数据的概率密度并将数据似然与某些阈值进行比较来识别异常值数据。这种经典方法遭受了维度的诅咒,这是常常在非常高的尺寸中发现的常亮数据的严重问题。我们开发了基于结构化低秩近似方法的异常检测方法。目标函数包括基于图表拉普拉斯中捕获的邻域信息的规范器。结果公开的基因组数据显示,我们的方法强大地检测到异常值,而基于密度的方法即使在中等维度下也会发生故障。此外,与流行的维度减少技术相比,我们的方法在恢复的低维投影中具有更好的聚类和可视化性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号