LegalBERT-th: Development of Legal QA Dataset and Automatic Question Tagging

机译：Legalbert-th：合法的开发和数据集和自动问题标记

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Tagging questions according to their topics is useful for internet forum management. In this paper, we use the Bidirectional Encoder Representations from Transformers (BERT) model to categorize posts from Thai legal internet forums. First, We construct our new legal Q&A dataset by scraping the internet, cleaning the data, and annotating the data. Second, We perform transfer learning to let our model learn about the legal language model in general and then fine-tune the model for the law topic classification task. As a result, we have developed a legal Q&A dataset of 12,695 question/answer pairs and a law topic classification model based on BERT with 92% accuracy. Finally, we build a prototype legal internet forum which equipped with the automatic tagging function, law topic classification, to provide a concrete example of how to apply the model in the real situation.

机译：根据他们的主题标记问题对于互联网论坛管理是有用的。在本文中，我们使用来自变换器（BERT）模型的双向编码器表示来分类来自泰国法律互联网论坛的帖子。首先，通过缩写互联网，清洁数据并注释数据来构建新的法律Q＆A数据集。其次，我们执行转移学习，让我们的模型了解法律语言模型一般，然后微调法律主题分类任务的模型。因此，我们制定了一个用于12,695个问题/答案对的法律问答和数据集，以及基于伯特的法律主题分类模型，精度为92％。最后，我们建立了一个配备自动标记函数，法律主题分类的原型法律互联网论坛，提供了如何在真实情况下应用模型的具体示例。

著录项

来源
《International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology》|2021年|1159-1162|共4页
会议地点
作者
Kannika Wiratchawa; Tanutcha Khunthong; Thanapong Intharah;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Law; Computational modeling; Bit error rate; Transfer learning; Prototypes; Tagging; Cleaning;

机译：法律;计算建模;误码率;转移学习;原型;标记;清洁;

相似文献

外文文献
中文文献
专利

1. 真实网络数据集自动问答系统中的问题分类 [J] . 袁晓洁, 于士涛, 师建兴, 东南大学学报（英文版） . 2008,第003期
2. Analysis of errors in the automatic translation of questions for translingual QA systems [J] . Maria-Dolores Olvera-Lobo, Lola Garcia-Santiago Journal of documentation . 2010,第3期

机译：跨语言质量检查系统的问题自动翻译中的错误分析
3. A QA Cycle for Teaching Programming. A Mechanism for Automatically Posing Questions Corresponding to Learner's Skill [J] . Hideki Nakajima, Naohisa Takahashi, Yoshihide Hosokawa Systems and Computers in Japan . 2007,第1期

机译：教学程序的质量检查周期。自动提出与学习者技能相对应的问题的机制
4. Complementary QA Network Analysis for QA Retrieval in Social Question-Answering Websites [J] . Duen-Ren Liu, Yu-Hsuan Chen, Minxin Shen, Journal of the American Society for Information Science and Technology . 2015,第1期

机译：社会问答网站中QA检索的补充QA网络分析
5. ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering [C] . Zhou Yu, Dejing Xu, Jun Yu, AAAI Conference on Artificial Intelligence . 2019

机译：ActivityNet-QA：通过问题应答理解复杂的Web视频的数据集
6. CINDI_QA: A template-based bilingual question answering system [D] . Haddad, Chedid 2008

机译：CINDI_QA：基于模板的双语问答系统
7. A dataset of microscopic peripheral blood cell images for development of automatic recognition systems [O] . Andrea Acevedo, Anna Merino, Santiago Alférez, 2020

机译：用于开发自动识别系统的微观外周血细胞图像的数据集
8. Analysis of errors in the automatic translation of questionsfor translingual QA systems [O] . Olvera-Lobo María-Dolores, García-Santiago Lola 2010

机译：问题自动翻译中的错误分析用于跨语言质量检查系统

LegalBERT-th: Development of Legal QA Dataset and Automatic Question Tagging

摘要

著录项

相似文献

相关主题

期刊订阅