Automatic Generation of the Draft Procuratorial Suggestions Based on an Extractive Summarization Method: BERTSLCA

Sun Yufeng; Yang Fengbao; Wang XiaoxiaDong Hongsong

摘要

The automatic generation of the draft procuratorial suggestions is to extract the description of illegal facts, administrative omission, description of laws and regulations, and other information from the case documents. Previously, the existing deep learning methods mainly focus on context-free word embeddings when addressing legal domain-specific extractive summarization tasks, which cannot get a better semantic understanding of the text and in turn leads to an adverse summarization performance. To this end, we propose a novel deep contextualized embeddings-based method BERTSLCA to conduct the extractive summarization task. The model is mainly based on the variant of BERTcalled BERTSUM. Firstly, the input document is fed into BERTSUM to get sentence-level embeddings. Then, we design an extracting architecture to catch the long dependency between sentences utilizing the Bi-Long Short-Term Memory (Bi-LSTM) unit, and at the end of the architecture, three cascaded convolution kernels with different sizes are designed to extract the relationships between adjacent sentences. Last, we introduce an attention mechanism to strengthen the ability to distinguish the importance of different sentences. To the best of our knowledge, this is the first work to use the pretrained language model for extractive summarization tasks in the field of Chinese judicial litigation. Experimental results on public interest litigation data and CAIL 2020 dataset all demonstrate that the proposed method achieves competitive performance.

机译：自动生成检察建议书，就是从案件文书中提取违法事实说明、行政遗漏、法律法规说明等信息。以前，现有的深度学习方法在处理特定法律领域的抽取式摘要任务时，主要侧重于上下文无关的词嵌入，这无法获得更好的文本语义理解，进而导致不利的摘要性能。为此，我们提出了一种新的基于深度语境化嵌入的方法BERTSLCA来执行抽取摘要任务。该模型主要基于BERT的变体，称为BERTSUM。首先，将输入文档输入到 BERTSUM 中，以获得句子级嵌入。然后，我们设计了一种利用双长短期记忆（Bi-LSTM）单元来捕获句子之间的长依赖关系的提取架构，并在架构的最后设计了三个不同大小的级联卷积核来提取相邻句子之间的关系。最后，我们引入了一种注意力机制，以加强区分不同句子重要性的能力。据我们所知，这是中国司法诉讼领域首次使用预训练语言模型进行抽取摘要任务的工作。在公益诉讼数据和CAIL 2020数据集上的实验结果表明，所提方法具有较强的竞争力。

Automatic Generation of the Draft Procuratorial Suggestions Based on an Extractive Summarization Method: BERTSLCA

摘要

著录项

引文网络

相关主题

期刊订阅