首页> 外文期刊>International Journal of Population Data Science >Establishing an International Data Linkage Repository Workgroup Toward a Benchmarking Repository
【24h】

Establishing an International Data Linkage Repository Workgroup Toward a Benchmarking Repository

机译:建立一个国际数据链接存储库工作组,以建立一个基准存储库

获取原文
           

摘要

IntroductionAccess to real data with diverse attributes is critical for effective development of any data analytic algorithm. Benchmarking data repositories have all been vital to the development of research communities focused on algorithm development. This work reports on the development of such a data repository for record linkage. Objectives and ApproachEstablishing a common benchmarking repository of real data can propel a field to the next level of rigor by facilitating comparison of different algorithms, understanding what type of algorithms work best under certain real data conditions and problem domains, promoting transparency and replicability of research, and creating incentives for proper citations for contributions. In addition, benchmarking repositories can bring together the diverse stakeholders (e.g., computer scientists, statisticians, data custodians, data users including social, behaviour, economic, and health (SBEH) scientists) that can advance the field more effectively than could researchers from any single discipline. ResultsIn Fall 2016, international leaders in record linkage formed a Data Linkage Repository workgroup (DLRep) to establish a benchmarking data repository for record linkage. The workgroup is working in collaboration with The Inter-university Consortium for Political and Social Research (ICPSR) to host the site data repository planned for release in Summer 2018. The repository for record linkage research will house various types of real data that require linking with metadata, unique handles for citations, proposed algorithms for evaluation criteria, and a platform for posting, sharing, and comparing results as well as citations of relevant papers. Some datasets will have the gold standard published that researchers can evaluate their results against. Other datasets will gather results to build the gold standard as a community. Conclusion/ImplicationsRecord linkage methodology is important to domains where data needs to be integrated from multiple sources, including diverse disciplines. Establishing an international interdisciplinary research community around a benchmark data linkage repository to validate and compare linkage algorithms is crucial to fully realizing the social benefits of data about people.
机译:简介访问具有多种属性的真实数据对于有效开发任何数据分析算法至关重要。基准数据存储库对于专注于算法开发的研究社区的发展都至关重要。这项工作报告了用于记录链接的这种数据库的开发。目标和方法建立通用的基准数据存储库可以通过促进不同算法的比较,了解哪种类型的算法在某些实际数据条件和问题域中最有效,促进研究的透明性和可复制性,将一个领域推向更高的严峻水平,并建立激励措施,以适当地引用捐款。此外,基准存储库可以将各种利益相关者(例如,计算机科学家,统计学家,数据保管人,包括社会,行为,经济和健康(SBEH)科学家在内的数据用户)召集在一起,比任何其他研究人员都可以更有效地推进该领域单门学科。结果2016年秋季,记录链接方面的国际领导者成立了数据链接存储库工作组(DLRep),以建立基准记录数据库以进行记录链接。该工作组正在与大学政治与社会研究联合会(ICPSR)合作,托管计划于2018年夏季发布的站点数据存储库。用于记录链接研究的存储库将存储需要与之链接的各种类型的真实数据。元数据,引用的唯一句柄,提议的评估标准算法,以及发布,共享和比较结果以及相关论文引用的平台。一些数据集将发布黄金标准,研究人员可以据此评估其结果。其他数据集将收集结果以建立社区金标准。结论/含义记录链接方法对于需要从多个来源(包括不同学科)集成数据的领域非常重要。建立围绕基准数据链接存储库的国际跨学科研究社区,以验证和比较链接算法,对于全面实现有关人的数据的社会效益至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号