LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text

机译：LeNER-Br：巴西法律文本中用于命名实体识别的数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named entity recognition systems have the untapped potential to extract information from legal documents, which can improve information retrieval and decision-making processes. In this paper, a dataset for named entity recognition in Brazilian legal documents is presented. Unlike other Portuguese language datasets, this dataset is composed entirely of legal documents. In addition to tags for persons, locations, time entities and organizations, the dataset contains specific tags for law and legal cases entities. To establish a set of baseline results, we first performed experiments on another Portuguese dataset: Paramopama. This evaluation demonstrate that LSTM-CRF gives results that are significantly better than those previously reported. We then retrained LSTM-CRF, on our dataset and obtained F_1 scores of 97.04% and 88.82% for Legislation and Legal case entities, respectively. These results show the viability of the proposed dataset for legal applications.

机译：具名实体识别系统具有从法律文件中提取信息的未开发潜力，可以改善信息检索和决策过程。本文介绍了巴西法律文件中用于命名实体识别的数据集。与其他葡萄牙语语言数据集不同，此数据集完全由法律文件组成。除了用于人员，地点，时间实体和组织的标签外，数据集还包含用于法律和法律案件实体的特定标签。为了建立一组基准结果，我们首先在另一个葡萄牙语数据集：Paramopama上进行了实验。该评估表明，LSTM-CRF提供的结果明显优于先前报道的结果。然后，我们在我们的数据集上对LSTM-CRF进行了重新训练，分别获得了立法和法律案例实体的F_1分数分别为97.04％和88.82％。这些结果表明了拟议的数据集在法律应用中的可行性。

著录项

来源
《International conference on computational processing of portuguese》|2018年|313-323|共11页
会议地点
作者
Pedro Henrique Luz de Araujo; Teofilo E. de Campos; Renato R. R. de Oliveira; Matheus Stauffer; Samuel Couto; Paulo Bermejo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Named entity recognition; Natural language processing Portuguese processing;

机译：命名实体识别;自然语言处理葡萄牙语处理;

相似文献

外文文献
中文文献
专利

1. Biomedical named entity recognition and linking datasets: survey and our recent development [J] . Ming-Siang Huang, Po-Ting Lai, Pei-Yen Lin, Briefings in bioinformatics . 2020,第6期

机译：生物医学命名实体识别和链接数据集：调查和我们最近的发展
2. Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques [J] . Beyza Yaman, Michele Pasin, Markus Freudenberg OASIcs : OpenAccess Series in Informatics . 2019,第1期

机译：使用链接发现和命名实体识别技术互连SciGraph和DBpedia数据集
3. Dataset-aware multi-task learning approaches for biomedical named entity recognition [J] . Zuo Mei, Zhang Yang Bioinformatics . 2020,第15期

机译：DataSet感知生物医学名为实体识别的多任务学习方法
4. LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text [C] . Pedro Henrique Luz de Araujo, Teofilo E. de Campos, Renato R. R. de Oliveira, International Workshop on Computational Processing of the Portuguese Language . 2018

机译：Lener-Br：巴西法律文本中指定实体识别的数据集
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. De-identifying Spanish medical texts - named entity recognition applied to radiology reports [O] . Irene Pérez-Díez, Raúl Pérez-Moraga, Adolfo López-Cerdán, 2021

机译：去识别西班牙医学文本 - 命名实体识别适用于放射学报告
7. Text Segmentation Using Named Entity Recognition and Co-Reference Resolution in Greek Texts [O] . Φράγκου Παυλίνα 2011

机译：希腊文本中使用命名实体识别和共参考分辨率的文本分割

LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text

摘要

著录项

相似文献

相关主题

期刊订阅