首页> 外文期刊>Computer Science & Information Technology >Evaluating Dutch Named Entity Recognition and De-Identification Methods in the Human Resource Domain
【24h】

Evaluating Dutch Named Entity Recognition and De-Identification Methods in the Human Resource Domain

机译:评估荷兰名为人力资源域中的实体识别和去识别方法

获取原文
       

摘要

The human resource (HR) domain contains various types of privacy-sensitive textual data, such as e-mail correspondence and performance appraisal. Doing research on these documents brings several challenges, one of them anonymisation. In this paper, we evaluate the current Dutch text de-identification methods for the HR domain in three steps. First, by updating one of these methods with the latest named entity recognition (NER) models. The result is that the NER model based on the CoNLL 2002 corpus in combination with the BERTje transformer give the best combination for suppressing persons (recall 0.94) and locations (recall 0.82). For suppressing gender, DEDUCE is performing best (recall 0.53). Second NER evaluation is based on both strict de-identification of entities (a person must be suppressed as a person) and third evaluation on a loose sense of de-identification (no matter what how a person is suppressed, as long it is suppressed).
机译:人力资源(HR)域包含各种类型的隐私敏感文本数据,例如电子邮件对应和绩效评估。对这些文件进行研究会带来几个挑战,其中一个挑战是一个匿名。在本文中,我们在三个步骤中评估了HR域的当前荷兰文本去识别方法。首先,通过使用最新的命名实体识别(ner)模型更新这些方法之一。结果是,基于Conll 2002语料库的NER模型与BERTJE变压器组合给出了用于抑制人员的最佳组合(召回0.94)和位置(召回0.82)。为了抑制性别,推断力表现最佳(召回0.53)。第二个评估基于实体的严格去识别(必须被抑制为一个人)和第三个关于松散的去识别意识(无论人群被抑制的方式)的第三种评估,只要它被抑制。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号