首页> 外文会议>International Conference on Language Resources and Evaluation >A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial
【24h】

A Real-World Data Resource of Complex Sensitive Sentences Based on Documents from the Monsanto Trial

机译:基于Monsanto试验的文档的复杂敏感句子的真实数据资源

获取原文

摘要

In this work we present a corpus for the evaluation of sensitive information detection approaches that addresses the need for real world sensitive information for empirical studies. Our sentence corpus contains different notions of complex sensitive information that correspond to different aspects of concern in a current trial of the Monsanto company. This paper describes the annotations process, where we both employ human annotators and furthermore create automatically inferred labels regarding technical, legal and informal communication within and with employees of Monsanto, drawing on a classification of documents by lawyers involved in the Monsanto court case. We release corpus of high quality sentences and parse trees with these two types of labels on sentence level. We characterize the sensitive information via several representative sensitive information detection models, in particular both keyword-based (n-gram) approaches and recent deep learning models, namely, recurrent neural networks (LSTM) and recursive neural networks (RecNN). Data and code are made publicly available.
机译:在这项工作中,我们提出了一种评估敏感信息检测方法的语料库,这些方法解决了对实证研究的真实世界敏感信息的需求。我们的句子语料库包含了与蒙松罗公司当前试验中令人关注的不同方面的复杂敏感信息的不同概念。本文介绍了注释过程,我们都雇用人类注册人员,此外,在孟山内的员工员工员工,律师们雇用员工的技术,法律和非正式沟通,以及参与蒙松罗法院案件的律师的分类。我们释放高质量句子的语料库,并在句子级别用这两种类型的标签解析树木。我们通过几个代表性信息检测模型来表征敏感信息,特别是基于关键词(n-gram)方法和近期深度学习模型,即反复性神经网络(LSTM)和递归神经网络(RECNN)。数据和代码是公开可用的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号