Classification of SDSS photometric data using machine learning on a cloud

Acharya Vishwanath; Bora Piyush Singh; Navin Karri; Nazareth Anisha; Anusha P. S.; Rao Shrisha

首页> 外文期刊>Current Science: A Fortnightly Journal of Research >Classification of SDSS photometric data using machine learning on a cloud

【24h】

Classification of SDSS photometric data using machine learning on a cloud

机译：使用机器学习在云中的SDSS光度数据分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Astronomical datasets are typically very large, and manually classifying the data in them is effectively impossible. We use machine learning algorithms to provide classifications (as stars, quasars and galaxies) for more than one billion objects given photometrically in the Third Data Release of the Sloan Digital Sky Survey (SDSS-III). We have used kNN, SVM and random forest algorithms in a distributed environment over the cloud to classify 1,183,850,913 unclassified photometric objects present in the SDSSIII catalog. This catalog contains photometric data for all objects viewed through a telescope and spectroscopic data for a small part of these. Although it is possible to classify all the objects using spectroscopic data, it is impractical to obtain such data for each one of them. To classify such a big dataset on a single machine would be impractically slow, so we have used the Spark cluster computing framework to implement a distributed computing environment over the cloud. We found that writing results (dozens of gigabytes) to the cloud storage is very slow while using kNN. Though writing the results with SVM is faster as it is done in parallel, its accuracy is only around 87%, due to lack of a kernel implementation of it in Spark. We then used the random forest algorithm to classify the entire set of 1,183,850,913 objects with an accuracy of 94% in about 17 hours of processing time. The result set is significant as even collecting spectroscopic data for these many objects would take decades, and our classifications can help astronomers and astrophysicists carry out further studies.

机译：天文数据集通常非常大，并且手动对它们中的数据进行分类，实际上是不可能的。我们使用机器学习算法为在Sloan数字天空调查（SDSS-III）的第三数据释放中，提供超过10亿物体的分类（作为星星，Quasar和星系），以上给出了一亿多物体（SDSS-III）。我们在云上使用了KNN，SVM和随机森林算法在分布式环境中进行了分类，以分类SDSSIII目录中存在的1,183,850,913个未分类的光度法对象。此目录包含通过望远镜查看的所有对象的光度数据，以及这些对象的所有物体以及用于这些小部分的光谱数据。虽然可以使用光谱数据对所有对象进行分类，但是从获取它们中的每一个数据是不切实际的。要对单个计算机上的此类大数据集进行分类，因此是不切实际的缓慢，因此我们使用了Spark Cluster Computing框架来实现云上的分布式计算环境。我们发现使用KNN时，将结果（数十个千兆字节）写入云存储非常慢。虽然用SVM写出结果并行时更快，但其精度仅为87％，因为它以火花缺乏核心实施。然后，我们使用随机林算法对整个1,183,850,913个对象进行分类，精度为约17小时的处理时间。结果集是显着的，甚至收集这些许多物体的光谱数据将花费数十年，我们的分类可以帮助天文学家和天体物理学家进行进一步的研究。

著录项

来源
《Current Science: A Fortnightly Journal of Research》 |2018年第2期|共9页
作者
Acharya Vishwanath; Bora Piyush Singh; Navin Karri; Nazareth Anisha; Anusha P. S.; Rao Shrisha;
展开▼
作者单位

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

Int Inst Informat Technol Bangalore 26-C Elect City Bengaluru 560100 India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自然科学现状及发展;
关键词
Astronomical data; classification; cloud computing; distributed algorithms; machine learning;

机译：天文数据;分类;云计算;分布式算法;机器学习;

相似文献

外文文献
中文文献
专利

1. Classification of SDSS photometric data using machine learning on a cloud [J] . Acharya Vishwanath, Bora Piyush Singh, Navin Karri, Current Science: A Fortnightly Journal of Research . 2018,第2期

机译：使用机器学习在云中的SDSS光度数据分类
2. Automated rebar diameter classification using point cloud data based machine learning [J] . Min-Koo Kim, Julian Pratama Putra Thedja, Hung-Lin Chi, BM . 2021,第1期

机译：自动钢筋直径分类使用点云数据基机学习
3. Big data and machine learning framework for clouds and its usage for text classification [J] . Pintye Istvan, Kail Eszter, Kacsuk Peter, Concurrency and computation: practice and experience . 2021,第19期

机译：云层的大数据和机器学习框架及其文本分类的用法
4. A Machine Learning Algorithm TsF K-NN Based on Automated Data Classification for Securing Mobile Cloud Computing Model [C] . Anunaya Inani, Chakradhar Verma, Suvrat Jain IEEE International Conference on Computer and Communication Systems . 2019

机译：基于自动数据分类的机器学习算法TsF K-NN保护移动云计算模型
5. Semi-Supervised Machine Learning Techniques for Classification of Evolving Data in Pattern Recognition =TECHNIQUES SEMI-SUPERVISéES D'APPRENTISSAGE MACHINE POUR LA CLASSIFICATION DES DONNéES EN éVOLUTION EN RECONNAISSANCE DE FORMES [D] . Tencer, Lukas. 2017

机译：半监督机器学习技术，用于模式识别中不断发展的数据分类=在表单识别中对数据进行分类的半监督机器学习技术
6. Securing Machine Learning in the Cloud: A Systematic Review of Cloud Machine Learning Security [O] . Adnan Qayyum, Aneeqa Ijaz, Muhammad Usama, 2020

机译：保护机器学习在云中：对云机学习安全的系统综述
7. Robust Machine Learning Applied to Astronomical Datasets III: Probabilistic Photometric Redshifts for Galaxies and Quasars in the SDSS and GALEX [O] . Ball, Nicholas M., Brunner, Robert J., Myers, Adam D., 2008

机译：适用于天文数据集的稳健机器学习III： sDss和sDs中星系和类星体的概率光度红移 GaLEX

Classification of SDSS photometric data using machine learning on a cloud

摘要

著录项

相似文献

相关主题

期刊订阅