首页> 外文期刊>The Astrophysical journal >Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees
【24h】

Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

机译:强大的机器学习应用于天文数据集。 I.使用决策树对斯隆数字天空测量DR3进行星系分类

获取原文
           

摘要

We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r 20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r ~ 18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r ~ 20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.
机译:我们使用在SDSS光谱数据上对477,068个对象进行训练的决策树,为SDSS的第三个数据版本中的所有1.43亿个非重复光度学对象提供了分类。我们证明这些星/星系分类法对于r约为20的2200万个对象是可靠的。通用机器学习环境数据到知识和超级计算资源使人们能够广泛研究决策树参数空间。这项工作是针对整个SDSS数据发布以这种方式分类的对象的首次公开发布。这些物体被分类为星系,恒星或nsng(既不是恒星也不是星系),并且每个类别都有相关的概率。为了演示如何有效利用这些分类,我们执行了几个重要的测试。首先,我们详细介绍了由这三个类别定义的概率空间内的选择标准,以提取给定完整性和效率的恒星和星系样本。其次,我们通过对SDSS,2dFGRS和2QZ调查中的对象执行盲法测试来研究分类的效力以及从光谱系统推断的效果。给定我们的光谱训练数据的光度极限,我们可以有效地开始推断超出我们位于r〜18的星系训练集合。通过将训练样本的数量与分类来源进行比较,我们发现我们的效率似乎保持对r〜20的稳健性。因此,我们希望我们的分类对于900,000个星系和670万个恒星是准确的,并且通过推断对总共800万个星系和1390万个恒星保持稳健。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号