首页> 外文会议>IEEE International Conference on Machine Learning and Applications >Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors
【24h】

Cybersecurity Automated Information Extraction Techniques: Drawbacks of Current Methods, and Enhanced Extractors

机译:网络安全自动信息提取技术:当前方法的弊端和增强的提取器

获取原文

摘要

We address a crucial element of applied information extraction-accurate identification of basic security entities in text--by evaluating previous methods and presenting new labelers. Our survey reveals that the previous efforts have not been tested on documents similar to the targeted sources (news articles, blogs, tweets, etc.) and that no sufficiently large publicly available annotated corpus of these documents exists. By assembling a representative test corpus, we perform a quantitative evaluation of previous methods in a realistic setting, revealing an overall lack of recall, and giving insight to the models' beneficial and inhibiting elements. In particular, our results show that many previous efforts overfit to the non-representative test corpora in this domain. Informed by this evaluation, we present three novel cyber entity extractors, which seek to leverage the available labeled data but remain worthwhile on the more diverse documents encountered in the wild. Each new model increases the state of the art in recall, with maximal or near maximal F1 score. Our results establish that the state of the art in cyber entity tagging is characterized by F1 = 0.61.
机译:我们通过评估以前的方法并展示新的标签来解决应用信息提取的关键要素-准确识别文本中的基本安全实体。我们的调查表明,以前的努力尚未在类似于目标来源(新闻文章,博客,推文等)的文档上进行过测试,并且这些文档没有足够大的可公开获得注释的语料库。通过组建有代表性的测试语料库,我们在现实的环境中对以前的方法进行了定量评估,揭示了整体召回不足的情况,并洞悉了模型的有益和抑制因素。特别是,我们的结果表明,先前的许多努力都过度适合了该领域的非代表性测试语料库。通过此评估,我们介绍了三种新颖的网络实体提取器,它们试图利用可用的标记数据,但仍然值得在野外遇到的各种文档中使用。每个新模型都以最大或接近最大的F1得分提高了召回水平。我们的结果表明,网络实体标记的最新技术特征为F1 = 0.61。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号