首页> 美国卫生研究院文献>other >Data reuse and the open data citation advantage
【2h】

Data reuse and the open data citation advantage

机译:数据重用和开放数据引用优势

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Background. Attribution to the original contributor upon reuse of published data is important both as a reward for data creators and to document the provenance of research findings. Previous studies have found that papers with publicly available datasets receive a higher number of citations than similar studies without available data. However, few previous analyses have had the statistical power to control for the many variables known to predict citation rate, which has led to uncertain estimates of the “citation benefit”. Furthermore, little is known about patterns in data reuse over time and across datasets.>Method and Results. Here, we look at citation rates while controlling for many known citation predictors and investigate the variability of data reuse. In a multivariate regression on 10,555 studies that created gene expression microarray data, we found that studies that made data available in a public repository received 9% (95% confidence interval: 5% to 13%) more citations than similar studies for which the data was not made available. Date of publication, journal impact factor, open access status, number of authors, first and last author publication history, corresponding author country, institution citation history, and study topic were included as covariates. The citation benefit varied with date of dataset deposition: a citation benefit was most clear for papers published in 2004 and 2005, at about 30%. Authors published most papers using their own datasets within two years of their first publication on the dataset, whereas data reuse papers published by third-party investigators continued to accumulate for at least six years. To study patterns of data reuse directly, we compiled 9,724 instances of third party data reuse via mention of GEO or ArrayExpress accession numbers in the full text of papers. The level of third-party data use was high: for 100 datasets deposited in year 0, we estimated that 40 papers in PubMed reused a dataset by year 2, 100 by year 4, and more than 150 data reuse papers had been published by year 5. Data reuse was distributed across a broad base of datasets: a very conservative estimate found that 20% of the datasets deposited between 2003 and 2007 had been reused at least once by third parties.>Conclusion. After accounting for other factors affecting citation rate, we find a robust citation benefit from open data, although a smaller one than previously reported. We conclude there is a direct effect of third-party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. Other factors that may also contribute to the citation benefit are considered. We further conclude that, at least for gene expression microarray data, a substantial fraction of archived datasets are reused, and that the intensity of dataset reuse has been steadily increasing since 2003.
机译:>背景。重复使用已发布的数据时,对原始贡献者进行归因很重要,这既是对数据创建者的奖励,也是对研究结果来源的证明。先前的研究发现,与没有可用数据的类似研究相比,具有可公开获得的数据集的论文被引用的次数更高。但是,很少有先前的分析具有统计能力来控制许多已知的预测引证率的变量,这导致对“引证收益”的不确定估计。此外,关于时间跨数据集和跨数据集的数据复用模式知之甚少。>方法和结果。在这里,我们在控制许多已知引用预测因子的同时查看引用率,并研究数据复用的可变性。在对10555个创建基因表达微阵列数据的研究进行的多元回归分析中,我们发现,使数据在公共存储库中可用的研究的引用率比类似研究高9%(95%置信区间:5%至13%)。无法使用。协变量包括出版日期,期刊影响因子,开放访问状态,作者人数,第一作者和最后作者的出版历史,相应的作者所在国家/地区,机构引用历史和研究主题。引用收益随数据集保存日期的不同而变化:对于2004年和2005年发表的论文,引用收益最为明显,约为30%。作者首次在数据集上发表两年之内就使用自己的数据集发表了大多数论文,而第三方调查员发表的数据重用论文至少持续积累了六年。为了直接研究数据重用的模式,我们在全文中提到了GEO或ArrayExpress登录号,从而编译了9,724个第三方数据重用实例。第三方数据的使用水平很高:对于第0年存放的100个数据集,我们估计PubMed中有40篇论文在第2年重用了一个数据集,在第4年有100篇重用了该数据集,并且到第1年已发布了150多篇数据重用的论文。 5.数据重用分布在广泛的数据集中:非常保守的估计发现,2003年至2007年之间存放的数据集中有20%已被第三方至少重用过一次。>结论。对于影响引用率的其他因素,我们发现公开数据对引用有明显的好处,尽管比以前报道的要小。我们得出的结论是,第三方数据重用的直接影响会持续到研究人员发表大部分论文重用自己的数据的时间之后。考虑可能也有助于引用利益的其他因素。我们进一步得出结论,至少对于基因表达微阵列数据,有很大一部分已存档的数据集被重用,并且自2003年以来,数据集重用的强度一直在稳定增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号