Ground-based optical surveys such as PanSTARRS, DES, and LSST will produce large catalogs to limiting magnitudes of r 24. Star-galaxy separation poses a major challenge to such surveys because galaxies—even very compact galaxies—outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM 0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven support vector machines (SVMs). For template fitting, we use a maximum likelihood (ML) method and a new hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training data to classify unknown sources; ML and HB do not. We consider (1) a best-case scenario (SVMbest) where the training data are (unrealistically) a random sampling of the data in both signal-to-noise and demographics and (2) a more realistic scenario where training is done on higher signal-to-noise data (SVMreal) at brighter apparent magnitudes. Testing with COSMOS ugriz data, we find that HB outperforms ML, delivering ~80% completeness, with purity of ~60%-90% for both stars and galaxies. We find that no algorithm delivers perfect performance and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVMbest, HB, ML, and SVMreal. We conclude, therefore, that a well-trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, HB template fitting may prove to be the optimal classification method in future surveys.
展开▼