首页> 外文会议>IEEE International Conference on Big Data >CNN Application in Detection of Privileged Documents in Legal Document Review
【24h】

CNN Application in Detection of Privileged Documents in Legal Document Review

机译:CNN在法律文献审查中检测特权文档的应用

获取原文

摘要

Protecting privileged communications and data from disclosure is paramount for legal teams. Legal advice, such as attorney-client communications or litigation strategy are typically exempt from disclosure in litigations or regulatory events and are vital to the attorney-client relationship. To protect this information from disclosure, companies and outside counsel often review vast amounts of documents to determine those that contain privileged material. This process is extremely costly and time consuming. As data volumes increase, legal counsel normally employs methods to reduce the number of documents requiring review while balancing the need to ensure the protection of privileged information. Keyword searching is relied upon as a method to target privileged information and reduce document review populations. Keyword searches are effective at casting a wide net but often return overly inclusive results – most of which do not contain privileged information. To overcome the weaknesses of keyword searching, legal teams increasingly are using machine learning techniques to target privileged information. In these studies, classic text classification techniques are applied to build classification models to identify privileged documents. In this paper, the authors propose a different method by applying machine learning / convolutional neural network techniques (CNN) to identify privileged documents. Our proposed method combines keyword searching with CNN. For each keyword term, a CNN model is created using the context of the occurrences of the keyword. In addition, a method was proposed to select reliable privileged (positive) training keyword occurrences from labeled positive training documents. Extensive experiments were conducted, and the results show that the proposed methods can significantly reduce false positives while still capturing most of the true positives.
机译:保护披露的特权通信和数据对于法律团队至关重要。法律建议,如律师 - 客户沟通或诉讼战略通常免于诉讼或监管事件中的披露,对律师 - 客户关系至关重要。为了保护此信息免受披露,公司和外部律师常常审查大量文件以确定其中包含特权材料的文件。这个过程非常昂贵且耗时。随着数据量的增加,法律顾问通常使用方法来减少需要审查的文件数量,同时平衡需要确保保护特权信息的需要。依赖关键字搜索作为目标特权信息​​和减少文档审查群体的方法。关键字搜索在铸造广泛的网时是有效的,但通常返回过度包容的结果 - 其中大多数不包含特权信息。为了克服关键字搜索的弱点,法律团队越来越多地使用机器学习技术来定位特权信息。在这些研究中,应用经典文本分类技术来构建分类模型以识别特权文档。在本文中,作者通过应用机器学习/卷积神经网络技术(CNN)来识别特权文档来提出不同的方法。我们所提出的方法将关键字与CNN组合起来。对于每个关键字项,使用关键字的出现的上下文创建CNN模型。此外,提出了一种方法来选择来自标记的正面训练文件的可靠特权(正)训练关键字出现。进行了广泛的实验,结果表明,该方法可以显着降低误报,同时仍然捕获大多数真正的阳性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号