首页> 美国卫生研究院文献>Elsevier Sponsored Documents >A semi-supervised approach using label propagation to support citation screening
【2h】

A semi-supervised approach using label propagation to support citation screening

机译:使用标签传播支持引文筛选的半监督方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

class="kwd-title">Keywords: Active learning, Label propagation, Citation screening, Semi-supervised learning, Text classification class="head no_bottom_margin" id="ab015title">AbstractCitation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.
机译:<!-fig ft0-> <!-fig @ position =“ anchor” mode =文章f4-> <!-fig mode =“ anchred” f5-> <!-fig / graphic | fig / alternatives / graphic mode =“ anchored” m1-> class =“ kwd-title”>关键字:主动学习,标签传播,引文筛选,半监督学习,文本分类 class =“ head no_bottom_margin“ id =” ab015title“>摘要引文筛选是系统审查中识别与基础研究问题相关的引文的一个完整过程,是一项耗时且资源密集的任务。在筛选任务期间,分析人员手动为每个引文分配标签,以指定引文是否符合纳入评论的条件。最近,一些研究探索了在文本分类中使用主动学习来减少筛选任务中涉及的人员工作量。但是,现有方法需要大量的手动标记引文来进行文本分类,以实现强大的性能。在本文中,我们提出了一种半监督方法,该方法可通过利用标记和未标记引文之间的成对相似性来在筛选过程中尽早识别相关引文,从而无需额外的人工标记工作即可提高分类性能。我们的方法基于这样的假设:相似的引文具有相同的标签(例如,如果应包含一个引文,则还应包含其他相似的引文)。为了计算标记和未标记引用之间的相似性,我们研究了两个不同的特征空间,即词袋和基于词袋的频谱嵌入。半监督方法将手动标记的引用的分类代码传播到特征空间中的相邻未标记的引用。自动标记的引文与手动标记的引文结合在一起,形成增强的训练集。为了进行评估,我们将我们的方法应用于临床和公共卫生方面的评估。结果表明,我们的具有标签传播功能的半监督方法相对于临床和公共卫生评论中的两种最新的主动学习方法,在统计学上有显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号