Batch-mode active learning for technology-assisted review

机译：批处理模式主动学习，用于技术辅助审核

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, technology-assisted review (TAR) has become an increasingly important component of the document review process in litigation discovery. This is fueled largely by dramatic growth in data volumes that may be associated with many matters and investigations. Potential review populations frequently exceed several hundred thousands documents, and document counts in the millions are not uncommon. Budgetary and/or time constraints often make a once traditional linear review of these populations impractical, if not impossible - which made "predictive coding" the most discussed TAR approach in recent years. A key challenge in any predictive coding approach is striking the appropriate balance in training the system. The goal is to minimize the time that Subject Matter Experts spend in training the system, while making sure that they perform enough training to achieve acceptable classification performance over the entire review population. Recent research demonstrates that Support Vector Machines (SVM) perform very well in finding a compact, yet effective, training dataset in an iterative fashion using batch-mode active learning. However, this research is limited. Additionally, these efforts have not led to a principled approach for determining the stabilization of the active learning process. In this paper, we propose and compare several batch-mode active learning methods which are integrated within SVM learning algorithm. We also propose methods for determining the stabilization of the active learning method. Experimental results on a set of large-scale, real-life legal document collections validate the superiority of our method over the existing methods for this task.

机译：近年来，技术辅助审查（TAR）已成为诉讼发现文件审查过程的越来越重要的组成部分。这主要是通过可能与许多事项和调查相关的数据量的巨大增长来燃料。潜在的审查人口经常超过数十万个文件，数百万的文件计数并不少见。预算和/或时间限制通常会使这些人群的传统线性审查不切实际，如果不是不可能的话 - 这使得“预测编码”近年来最讨论的焦油方法。任何预测编码方法的关键挑战都在训练系统时令人挑剔。目标是最大限度地减少主题专家在培训系统方面花费的时间，同时确保他们对整个审查人口进行足够的培训以实现可接受的分类表现。最近的研究表明，支持向量机（SVM）在使用批处理模式活动学习中以迭代方式寻找紧凑，但有效，训练数据集非常好。然而，这项研究有限。此外，这些努力没有导致定义方法来确定主动学习过程的稳定性。在本文中，我们提出并比较了几种在SVM学习算法内集成的批次模式的活动学习方法。我们还提出了确定活性学习方法稳定的方法。实验结果对一组大规模，现实生活法律文件集合验证了我们对此任务的现有方法的方法的优势。

著录项

来源
《IEEE International Congress on Big Data》|2015年|1134-1143|共10页
会议地点
作者
Saha Tanay Kumar; Hasan Mohammad Al; Burgess Chandler; Habib Md Ahsan; Johnson Jeff;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Technology-assisted risk of bias assessment in systematic reviews: a prospective cross-sectional evaluation of the RobotReviewer machine learning tool [J] . Allison Gates, Ben Vandermeer, Lisa Hartling Journal of Clinical Epidemiology . 2018,第期

机译：技术辅助在系统评价中衡量偏见评估风险：RobotReviewer机器学习工具的前瞻性横截面评价
2. Automatic Tuning of the RBF Kernel Parameter for Batch-Mode Active Learning Algorithms: A Scalable Framework [J] . Chin-Chun Chang, Hsin-Ta Huang Cybernetics, IEEE Transactions on . 2019,第12期

机译：批处理模式主动学习算法的RBF内核参数的自动调整：可扩展的框架
3. Evolutionary Strategy to Perform Batch-Mode Active Learning on Multi-Label Data [J] . Reyes Oscar, Ventura Sebastian ACM transactions on intelligent systems . 2018,第4期

机译：对多标签数据执行批处理模式主动学习的进化策略
4. Batch-mode active learning for technology-assisted review [C] . Saha Tanay Kumar, Hasan Mohammad Al, Burgess Chandler, IEEE International Congress on Big Data . 2015

机译：批量模式主动学习技术辅助评论
5. A quantitative comparison of technology-assisted blended versus targeted instruction to address learning style differences. [D] . Leas, Douglas A. 2015

机译：对技术辅助的混合式和有针对性的教学进行定量比较，以解决学习风格的差异。
6. Technology-assisted title and abstract screening for systematic reviews: a retrospective evaluation of the Abstrackr machine learning tool [O] . Allison Gates, Cydney Johnson, Lisa Hartling 2018

机译：技术辅助的标题和摘要进行系统的审查：Abstrackr机器学习工具的回顾性评估
7. Multi-Faceted Recall of Continuous Active Learning for Technology-Assisted Review [O] . Gordon V. Cormack, Maura R. Grossman 2015

机译：多方面召回技术辅助审查的持续积极学习

Batch-mode active learning for technology-assisted review

摘要

著录项

相似文献

相关主题

期刊订阅