首页> 外文OA文献 >Enhancing Sensitivity Classification with Semantic Features using Word Embeddings
【2h】

Enhancing Sensitivity Classification with Semantic Features using Word Embeddings

机译:使用Word嵌入增强语义特征的灵敏度分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Government documents must be reviewed to identify any sensitive informationudthey may contain, before they can be released to the public. However,udtraditional paper-based sensitivity review processes are not practical for reviewingudborn-digital documents. Therefore, there is a timely need for automatic sensitivityudclassification techniques, to assist the digital sensitivity review process.udHowever, sensitivity is typically a product of the relations between combinationsudof terms, such as who said what about whom, therefore, automatic sensitivityudclassification is a difficult task. Vector representations of terms, such as wordudembeddings, have been shown to be effective at encoding latent term featuresudthat preserve semantic relations between terms, which can also be beneficial toudsensitivity classification. In this work, we present a thorough evaluation of theudeffectiveness of semantic word embedding features, along with term and grammaticaludfeatures, for sensitivity classification. On a test collection of governmentuddocuments containing real sensitivities, we show that extending text classificationudwith semantic features and additional term n-grams results in significant improvementsudin classification effectiveness, correctly classifying 9.99% more sensitiveuddocuments compared to the text classification baseline.
机译:在将政府文件发布给公众之前,必须对其进行审查以识别其可能包含的任何敏感信息。但是,传统的纸质敏感性审查过程对于审查数字化数字文档不切实际。因此,迫切需要自动灵敏度 udclassification技术,以协助数字灵敏度审查过程。 ud然而,灵敏度通常是组合 udof术语之间的关系的乘积,例如谁对谁说了些什么,因此自动敏感性分类是一项艰巨的任务。词语的矢量表示,例如单词嵌入,已被证明可有效地编码保留词语之间语义关系的潜在词语特征 ud,这也可能有益于 ud敏感性分类。在这项工作中,我们对语义词嵌入功能的 ud有效性,以及术语和语法 udfeatures,进行敏感性分类进行了全面的评估。在包含真实敏感度的政府 uddocument的测试集合中,我们显示出扩展文本分类 ud的语义特征和附加术语n-gram可以显着改善 udin分类的有效性,与文本分类相比,正确地将敏感 uddocument分类了9.99%基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号