Multi-wavelength astronomical studies require cross-identification ofdetections of the same celestial objects in multiple catalogs based onspherical coordinates and other properties. Because of the large data volumesand spherical geometry, the symmetric N-way association of astronomicaldetections is a computationally intensive problem, even when sophisticatedindexing schemes are used to exclude obviously false candidates. Legacyastronomical catalogs already contain detections of more than a hundred millionobjects while the ongoing and future surveys will produce catalogs of billionsof objects with multiple detections of each at different times. The varyingstatistical error of position measurements, moving and extended objects, andother physical properties make it necessary to perform the cross-identificationusing a mathematically correct, proper Bayesian probabilistic algorithm,capable of including various priors. One time, pair-wise cross-identificationof these large catalogs is not sufficient for many astronomical scenarios.Consequently, a novel system is necessary that can cross-identify multiplecatalogs on-demand, efficiently and reliably. In this paper, we present oursolution based on a cluster of commodity servers and ordinary relationaldatabases. The cross-identification problems are formulated in a language basedon SQL, but extended with special clauses. These special queries arepartitioned spatially by coordinate ranges and compiled into a complex workflowof ordinary SQL queries. Workflows are then executed in a parallel frameworkusing a cluster of servers hosting identical mirrors of the same data sets.
展开▼