STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING

Ross Fadely1; David W. Hogg23; Beth Willman1

摘要

Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r 24. Star-galaxy separation poses a major challenge to such surveys because galaxies—even very compact galaxies—outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM 0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training data to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVMbest) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVMreal) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering ~80% completeness, with purity of ~60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVMbest, HB, ML, and SVMreal. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.

机译：诸如PanSTARRS，DES和LSST之类的地面光学勘测将产生大量的目录，以限制r 24的大小。恒星-星系分离对此类勘测提出了重大挑战，因为在这些深度，星系（甚至非常紧凑的星系）的数量也超过了晕星。我们研究内在FWHM <0.2 arcsec的恒星和星系上的光度分类技术。我们考虑无监督的频谱能量分布模板拟合和有监督的，数据驱动的支持向量机（SVM）。对于模板拟合，我们使用最大似然（ML）方法和新的分层贝叶斯（HB）方法，该方法从数据中了解模板概率的先验分布。 SVM需要训练数据来对未知来源进行分类; ML和HB不。我们认为（1）最佳情况（SVMbest），其中训练数据是（不切实际地）在信噪比和人口统计数据中对数据进行随机采样;（2）更现实的情况是，在较高的水平上进行训练信噪比数据（SVMreal）处于更亮的视在幅度。使用COSMOS ugriz数据进行测试，我们发现HB优于ML，提供了〜80％的完整性，对于恒星和星系，其纯度均为〜60％-90％。我们发现，没有一种算法可以提供理想的性能，并且对金属贫乏的主序熄火恒星的研究可能会受到差的恒星-星系分离的挑战。使用接收器工作特性曲线，我们发现SVMbest，HB，ML和SVMreal的最差表现。因此，我们得出的结论是，训练有素的SVM将胜过模板拟合方法。但是，经过常规训练的SVM的性能较差。因此，HB模板拟合可能被证明是将来调查中的最佳分类方法。

STAR-GALAXY CLASSIFICATION IN MULTI-BAND OPTICAL IMAGING

摘要

著录项

引文网络

相关主题

期刊订阅