首页> 外文会议>Workshop on Scholarly Document Processing >Scaling Systematic Literature Reviews with Machine Learning Pipelines
【24h】

Scaling Systematic Literature Reviews with Machine Learning Pipelines

机译:缩放系统文献综述与机器学习管道

获取原文

摘要

Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and scrapers, selection of relevant documents can be done via binary classification, and extraction of data can be done via sequence-labelling classification. Despite the promise of automation for this field, little research exists that examines the various ways to automate each of these tasks. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We test the ability of classifiers to work well on small amounts of data and to generalise to data from countries not represented in the training data. We test different types of data extraction with varying difficulty in annotation, and five different neural architectures to do the extraction. We lind that we can get surprising accuracy and generalisabil-ity of the whole pipeline system with only 2 weeks of human-expert annotation, which is only 15% of the time it takes to do the whole review manually and can be repeated and extended to new data with no additional effort.
机译:系统的评价需要从大量科学文件中提取数据,是机器学习应用的理想途径。他们对许多科学和慈善事业的态度至关重要,但却非常耗时,需要专家。然而,系统审查的三个主要阶段可以自动完成:搜索文件可以通过API和刮板完成,可以通过二进制分类选择相关文件,并通过序列标记分类来完成数据的提取。尽管对该领域的自动化承诺,但存在很少的研究,审查了自动化这些任务的各种方式。我们构建一个自动化这些方面的管道,以及许多人类时间与系统质量权衡的实验。我们测试分类器在少量数据上运行的能力,并概括到培训数据中未代表的国家的数据。我们测试不同类型的数据提取,在注释中不同难度,以及五种不同的神经架构来进行提取。我们可以在人类专家注释中只有2周的整个管道系统获得令人惊讶的准确性和全面的管道系统,只有15%的人手动完成整体审查,可以重复和扩展到新数据没有额外的努力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号