首页> 外文期刊>Genetics in medicine >Toward modernizing the systematic review pipeline in genetics: Efficient updating via data mining
【24h】

Toward modernizing the systematic review pipeline in genetics: Efficient updating via data mining

机译:致力于实现遗传学系统评价渠道的现代化:通过数据挖掘进行有效更新

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Purpose: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. Methods: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as relevant or irrelevant using human screening as the gold standard. Results: Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases. Conclusion: Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans.
机译:目的:这项研究的目的是证明现代数据挖掘工具可以用作减少产生和维护系统评价所必需的劳动的一步。方法:我们使用了四个持续更新的人工策划资源,使用系统评价方法总结了整个领域的MEDLINE索引文章(PDGene,AlzGene和SzGene分别是帕金森病,阿尔茨海默氏病和精神分裂症的遗传决定因素;塔夫茨成本-成本效益分析(CEA)注册中心)。在每个数据集中,我们针对截至2009年的筛选出的引文训练了分类模型。然后,我们使用人类筛选作为金标准,评估了该模型将2010年发布的引文分类为相关或不相关的能力。结果:分类模型未分别错过PDGene,AlzGene和SzGene中104、65和179个合格的引用,并且在CEA注册中心中仅错过了79个中的1个(前三个为100%敏感性,99%为99%)。第四个)。各自的特异性分别为90%,93%,90%和73%。如果在2010年使用了半自动化系统,则人类只需要阅读605 / 5,616引用即可更新PDGene注册表(11%)和555 / 7,298(8%),717 / 5,381(13%)和334 /其他三个数据库为1,015(33%)。结论:数据挖掘方法可以减轻更新系统评价的负担,而不会丢失比人类更多的论文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号