首页> 外文OA文献 >Repository of NSF Funded Publications and Data Sets: 'Back of Envelope' 15 year Cost Estimate

【2h】

Repository of NSF Funded Publications and Data Sets: 'Back of Envelope' 15 year Cost Estimate

机译：NSF资助的出版物和数据集的存储库：“封底”的15年成本估算

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this back of envelope study we calculate the 15 year fixed and variable costs of setting up and running a data repository (or database) to store and serve the publications and datasets derived from research funded by the National Science Foundation (NSF). Costs are computed on a yearly basis using a fixed estimate of the number of papers that are published each year that list NSF as their funding agency. We assume each paper has one dataset and estimate the size of that dataset based on experience. By our estimates, the number of papers generated each year is 64,340. The average dataset size over all seven directorates of NSF is 32 gigabytes (GB). A total amount of data added to the repository is two petabytes (PB) per year, or 30 PB over 15 years.ud udThe architecture of the data/paper repository is based on a hierarchical storage model that uses a combination of fast disk for rapid access and tape for high reliability and cost efficient long-term storage. Data are ingested through workflows that are used in university institutional repositories, which add metadata and ensure data integrity. Average fixed costs is approximately $.0.90/GB over 15-year span. Variable costs are estimated at a sliding scale of $150 - $100 per new dataset for up-front curation, or $4.87 – $3.22 per GB. Variable costs reflect a 3% annual decrease in curation costs as efficiency and automated metadata and provenance capture are anticipated to help reduce what are now largely manual curation efforts.ud udThe total projected cost of the data and paper repository is estimated at $167,000,000 over 15 years of operation, curating close to one million of datasets and one million papers. After 15 years and 30 PB of data accumulated and curated, we estimate the cost per gigabyte at $5.56. This $167 million cost is a direct cost in that it does not include federally allowable indirect costs return (ICR).ud udAfter 15 years, it is reasonable to assume that some datasets will be compressed and rarely accessed. Others may be deemed no longer valuable, e.g., because they are replaced by more accurate results. Therefore, at some point the data growth in the repository will need to be adjusted by use of strategic preservation.

机译：在这封信的背景研究中，我们计算了建立和运行数据存储库（或数据库）以存储和服务由国家科学基金会（NSF）资助的研究得出的出版物和数据集的15年固定和可变成本。费用是使用每年对NSF列为其资助机构的论文的固定估计数，每年进行计算的。我们假设每篇论文都有一个数据集，并根据经验估算该数据集的大小。根据我们的估计，每年产生的论文数量为64,340。 NSF的所有七个局的平均数据集大小为32 GB。每年向存储库中添加的数据总量为两个PB（PB），或者在15年中总计为30 PB。 ud ud数据/纸质存储库的体系结构基于分层存储模型，该模型使用快速磁盘的组合用于快速访问的磁带和用于高可靠性和经济高效的长期存储的磁带。数据是通过大学机构存储库中使用的工作流摄取的，这些工作流会添加元数据并确保数据完整性。在15年内，平均固定成本约为$ .0.90 / GB。可变成本的估算规模为每个新数据集150美元-100美元（用于预先管理），或4.87美元-3.22美元/ GB。可变成本反映出策展成本每年减少3％，因为效率和自动元数据以及物产捕获预计将有助于减少目前主要由人工策展的工作。 ud ud数据和纸质资料库的总预计成本估计为167,000,000美元，运营15年，策划了近一百万个数据集和一百万篇论文。经过15年和30 PB的数据积累和整理，我们估计每GB的成本为5.56美元。这笔1.67亿美元的成本是直接成本，因为其中不包括联邦政府允许的间接成本回报（ICR）。 ud ud在15年后，可以合理地假设某些数据集将被压缩并且很少访问。可能认为其他值不再有价值，例如，因为其他值已被更准确的结果代替。因此，在某些时候，将需要通过使用策略性保存来调整存储库中的数据增长。

著录项

作者
Plale Beth; Kouper Inna; McDonald Robert; Seiffert Kurt; Konkiel Stacy;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种 {"code":"en","name":"English","id":9}
中图分类

相似文献

外文文献
中文文献
专利

1. Using a clinical data repository to estimate the frequency and costs of adverse drug events. [J] . Einbinder JS, Scully K Journal of the American Medical Informatics Association : . 2002,第6Suppla1期

机译：使用临床数据存储库来估计药物不良事件的发生频率和成本。
2. Health Care Costs Associated With Macrovascular, Microvascular, and Metabolic Complications of Type 2 Diabetes Across Time: Estimates From a Population-Based Cohort of More Than 0.8 Million Individuals With Up to 15 Years of Follow-up [J] . Diabetes care . 2020,第8期

机译：跨时期患有2型糖尿病的医疗保健成本与跨越2型糖尿病相关的卫生保健费用：估计超过80多万人的人口群组，最多15年的随访
3. Carillion sets limit of £15m a year for PFI bid costs… [J] . Building . 2007,第8507期

机译：Carillion为PFI竞标费用设置了每年1500万英镑的限额…
4. The Earth Radiation Budget Experiment (ERBE) 15-year data set [C] . Kathryn A. Bush, G. Louis Smith, Robert G. Lee III, Society of Photo-Optical Instrumentation Engineers Conference on Remote Sensing of Clouds and the Atmosphere . 2003

机译：地球辐射预算实验（ERBE）15年数据集
5. Estimating input parameters in activity-based costing systems utilizing fuzzy set theory. [D] . Nachtmann, Heather. 2000

机译：利用模糊集理论估计基于活动的成本系统中的输入参数。
6. Using a Clinical Data Repository to Estimate the Frequency and Costs of Adverse Drug Events [O] . Jonathan S. Einbinder, Kenneth Scully 2002

机译：使用临床数据存储库估算不良药物事件的发生频率和成本
7. Repository of NSF Funded Publications and Data Sets: "Back of Envelope" 15 year Cost Estimate [O] . Plale Beth, Seiffert Kurt, McDonald Robert, 2013

机译：NSF资助的出版物和数据集的存储库：“封底”的15年成本估算

Repository of NSF Funded Publications and Data Sets: 'Back of Envelope' 15 year Cost Estimate

摘要

著录项

相似文献

相关主题

期刊订阅