CNN Application in Detection of Privileged Documents in Legal Document Review

机译：CNN在法律文献审查中检测特权文档的应用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Protecting privileged communications and data from disclosure is paramount for legal teams. Legal advice, such as attorney-client communications or litigation strategy are typically exempt from disclosure in litigations or regulatory events and are vital to the attorney-client relationship. To protect this information from disclosure, companies and outside counsel often review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel normally employs methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but often return overly inclusive results – most of which do not contain privileged information. To overcome the weaknesses of keyword searching, legal teams increasingly are using machine learning techniques to target privileged information. In these studies, classic text classification techniques are applied to build classification models to identify privileged documents. In this paper, the authors propose a different method by applying machine learning / convolutional neural network techniques (CNN) to identify privileged documents. Our proposed method combines keyword searching with CNN. For each keyword term, a CNN model is created using the context of the occurrences of the keyword. In addition, a method was proposed to select reliable privileged (positive) training keyword occurrences from labeled positive training documents. Extensive experiments were conducted, and the results show that the proposed methods can significantly reduce false positives while still capturing most of the true positives.

机译：保护披露的特权通信和数据对于法律团队至关重要。法律建议，如律师 - 客户沟通或诉讼战略通常免于诉讼或监管事件中的披露，对律师 - 客户关系至关重要。为了保护此信息免受披露，公司和外部律师常常审查大量文件以确定其中包含特权材料的文件。这个过程非常昂贵且耗时。随着数据量的增加，法律顾问通常使用方法来减少需要审查的文件数量，同时平衡需要确保保护特权信息的需要。依赖关键字搜索作为目标特权信息和减少文档审查群体的方法。关键字搜索在铸造广泛的网时是有效的，但通常返回过度包容的结果 - 其中大多数不包含特权信息。为了克服关键字搜索的弱点，法律团队越来越多地使用机器学习技术来定位特权信息。在这些研究中，应用经典文本分类技术来构建分类模型以识别特权文档。在本文中，作者通过应用机器学习/卷积神经网络技术（CNN）来识别特权文档来提出不同的方法。我们所提出的方法将关键字与CNN组合起来。对于每个关键字项，使用关键字的出现的上下文创建CNN模型。此外，提出了一种方法来选择来自标记的正面训练文件的可靠特权（正）训练关键字出现。进行了广泛的实验，结果表明，该方法可以显着降低误报，同时仍然捕获大多数真正的阳性。

著录项

来源
《IEEE International Conference on Big Data》|2020年|1485-1492|共8页
会议地点
作者
Rishi Chhatwal; Robert Keeling; Peter Gronvall; Nathaniel Huber-Fliflet; Jianping Zhang; Haozhen Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Law; Keyword search; Text categorization; Machine learning; Big Data; Statistics;

机译：培训;法律;关键字搜索;文本分类;机器学习;大数据;统计;

相似文献

外文文献
中文文献
专利

1. Application of the Open Document Format for Legal Normative Documents in the National Assembly of Vietnam [J] . Le Quang Huy, Vilas Wuwongse International journal of electronic governance . 2012,第2期

机译：法律规范性文件的公开文件格式在越南国民议会中的应用
2. Conceptual framework for document semantic modelling: an application to document and knowledge management in the legal domain [J] . D. Jouve, Y. Amghar, B. Chabbat, Data & Knowledge Engineering . 2003,第3期

机译：文档语义建模的概念框架：法律领域中文档和知识管理的应用
3. APPLICATION OF ZERO-KNOWLEDGE PROOF IN RESOLVING DISPUTES OF PRIVILEGED DOCUMENTS IN E-DISCOVERY [J] . Yuqing Cui Harvard Journal of Law and Technology . 2019,第2期

机译：零知识证明在电子发现中解决特权文档争议的应用
4. Experimental Evaluation of CNN Parameters for Text Categorization in Legal Document Review [C] . Qian Han, Yufeng Kou, Derek Snaidauf IEEE International Conference on Big Data . 2019

机译：CNN参数在法律文件审阅中用于文本分类的实验评估
5. Predictive Coding Techniques With Manual Review to Identify Privileged Documents in E-Discovery [D] . Vinjumur, Jyothi K. 2018

机译：具有人工审查功能的预测编码技术，以识别电子发现中的特权文档
6. Documenting legal status: a systematic review of measurement of undocumented status in health research [O] . Maria-Elena De Trinidad Young, Daniel S. Madrigal 2017

机译：记录法律状态：对健康研究中未记录状态的度量的系统评价
7. Empirical Comparisons of CNN with Other Learning Algorithms for Text Classification in Legal Document Review [O] . Robert Keeling, Rishi Chhatwal, Nathaniel Huber-Fliflet, 2019

机译：法律文献综述中文本分类其他学习算法的经验比较
8. Diazinon: Position Document 1/2/3. Notice of Special Review and Preliminary Determination to Cancel Registration and Deny Applications for Certain Uses of Diazinon; Notice of Availability of Support Document [R] . 1990

机译：Diazinon：位置文件1/2/3。取消注册和拒绝Diazinon某些用途申请的特别审查和初步裁定通知;支持文件的可用性通知

CNN Application in Detection of Privileged Documents in Legal Document Review

摘要

著录项

相似文献

相关主题

期刊订阅