首页> 外文会议>Proceedings of the 73rd ASISamp;T annual meeting: navigating streams in an information ecosystems >A method to track dataset reuse in biomedicine: filtered GEO accession numbers in PubMed Central
【24h】

A method to track dataset reuse in biomedicine: filtered GEO accession numbers in PubMed Central

机译:一种跟踪生物医学数据集重用的方法:PubMed Central中经过过滤的GEO登录号

获取原文
获取原文并翻译 | 示例

摘要

Reusing research data has important potential benefits:rngenerative science and efficient resource use. Tracking thernreuse of research datasets would allow us to understandrnwhether the potential benefits are indeed realized, enablernrecognition of investigators who produce, annotate, andrnshare useful data, and inform data sharing and reuserninitiatives, tools, and policies.rnUnfortunately, the lack of clear attribution practices for datarnmake automated tracking of data reuse difficult. I present arnmethod for tracking research data reuse that takesrnadvantage of the community norms around gene expressionrnmicroarray data sharing and the rich NCBI Entrezrnresources. Specifically, the full-text of papers stored inrnPubMed Central are queried for accession numbers ofrndatasets archived in NCBI’s Gene Expression Omnibusrn(GEO) repository. Studies known to have createdrnmicroarray data are excluded through automated filters andrnguided manual curation. MeSH terms attached to the datarncreation and data reuse studies provide additionalrninformation for analysis. Finally, I extrapolate the findingsrnto all of PubMed.rnAutomated portions of this method have been implementedrnin python and are openly available. Although imperfect,rnthis dataset is a valuable initial resource for research intornpatterns of data reuse.
机译:重用研究数据具有重要的潜在好处:生成科学和有效利用资源。跟踪研究数据集的重复使用将使我们能够了解是否确实实现了潜在的好处,能够识别产生,注释和共享有用数据的研究人员,并为数据共享和重用提供了倡议,工具和政策。不幸的是,缺乏明确的归因做法datarn使自动跟踪数据重用变得困难。我提出了一种用于跟踪研究数据重用的方法,该方法利用了围绕基因表达,微阵列数据共享和丰富的NCBI Entrezrn资源的社区规范。具体来说,将查询存储在rnPubMed Central中的论文全文,以获取在NCBI的Gene Expression Omnibusrn(GEO)存储库中归档的rndataset的登录号。通过自动过滤器和引导的手动管理排除了已知已创建微阵列数据的研究。数据创建和数据重用研究附带的MeSH术语为分析提供了其他信息。最后,我将结果推算到所有PubMed中。该方法的自动化部分已在python中实现,并且可以公开获得。尽管不完善,但该数据集是研究数据重用模式的宝贵初始资源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号