【24h】

Petabyte Scale Data Mining: Dream or Reality?

机译:Petabyte Scale数据挖掘:梦想或现实?

获取原文

摘要

Science is becoming very data intensive. Today's astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such a task today is trivial, almost manageable with a laptop. We think that the issue of a PB database will be very similar in six years. In this paper we scale our current experiments in data archiving and analysis on the Sloan Digital Sky Survey data six years into the future. We analyze these projections and look at the requirements of performing data mining on such data sets. We conclude that the task scales rather well: we could do the job today, although it would be expensive. There do not seem to be any show-stoppers that would prevent us from storing and using a Petabyte dataset six years from today.
机译:科学正变得非常密集。今天的天文数据集具有数以十万个星系已经为数据挖掘带来了大量挑战。在不到10年的时间内,目录预计将增长到数十亿个对象,而图像档案将达到PETABYTES。想象一下1996年拥有100GB的数据库,当磁盘扫描速度为30MB / s时,数据库工具不成熟。今天的这样一项任务是微不足道的,几乎可以使用笔记本电脑。我们认为PB数据库的问题在六年内将非常相似。在本文中,我们将目前的实验扩展了六年的斯隆数字天空调查数据的数据归档和分析中。我们分析了这些预测,并查看在此类数据集上执行数据挖掘的要求。我们得出结论,任务相当稳定:我们今天可以做这项工作,虽然它会很贵。似乎没有任何展示者,可以阻止我们在今天六年来储存和使用Petabyte DataSet。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号