首页> 外文学位 >Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity.
【24h】

Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity.

机译:开发网络安全文本语料库及其在增强语义文本相似度中的应用。

获取原文
获取原文并翻译 | 示例

摘要

The growing use of cyber-services automatically impart great importance to cybersecurity. The Internet is a primary source of information regarding software flaws, vulnerabilities, cyber-attacks and exploits. This information is available through vulnerability databases, news articles, security bulletins and blogs. Variety of applications and security systems like Intrusion Detection Systems (IDS), Intrusion Prevention System (IPS), etc. can take advantage of this information for consolidating their infrastructure. The lack of availability of ready text corpus of high quality security information from various sources makes it difficult for these applications to use this information. To overcome this problem our work focuses on building a multi-genre corpus of security text using information retrieved from multiple internet based sources; National Vulnerabilities Database, Wikipedia articles, security blogs, security bulletins and scholarly papers. The system builds a text classifier from the initial high quality data which is used to classify and accommodate new data from these sources into the corpus.;This corpus can be used by variety of applications like IDS or IPS, in variety of ways like assertion into knowledge base or extraction of named entities. Our work explores one of the applications of generating the semantic text similarity model for cybersecurity text. We use the multi-genre cybersecurity text corpus for creating the word co-occurrence model. This model can extract the synonymity between the different security terms. For example, the words ' virus' and 'malware' that have same context are scored for their degree of similarity. The word co-occurrence model is then extended to generate a semantic text similarity model.The text similarity model extracts the semantic text similarity between different security texts like titles of the papers, vulnerability descriptions, blog paragraphs, etc. The system also develops a combined text similarity model from cybersecurity similarity model and generic text similarity model. This model can be used in document mining for matching security text, clustering documents describing similar vulnerabilities and so on.
机译:网络服务的日益普及自动地将网络安全性赋予了极大的重要性。互联网是有关软件缺陷,漏洞,网络攻击和利用的主要信息来源。可通过漏洞数据库,新闻文章,安全公告和博客获得此信息。各种应用程序和安全系统,例如入侵检测系统(IDS),入侵防御系统(IPS)等,都可以利用此信息来巩固其基础架构。缺乏来自各种来源的高质量安全信息的现成文本语料库,使得这些应用程序难以使用此信息。为了克服这个问题,我们的工作重点是使用从多个基于Internet的来源中检索到的信息来构建多类型的安全文本语料库;国家漏洞数据库,维基百科文章,安全博客,安全公告和学术论文。系统从最初的高质量数据构建文本分类器,该文本分类器用于将来自这些来源的新数据分类并将其容纳到语料库中;该语料库可以以各种方式(例如断言)用于IDS或IPS等应用程序知识库或命名实体的提取。我们的工作探索了为网络安全文本生成语义文本相似性模型的应用之一。我们使用多类型网络安全文本语料库来创建单词共现模型。该模型可以提取不同安全性术语之间的同义词。例如,对具有相同上下文的单词“病毒”和“恶意软件”的相似程度进行评分。然后扩展单词共现模型以生成语义文本相似性模型。文本相似性模型提取不同安全文本之间的语义文本相似性,例如论文标题,漏洞描述,博客段落等。系统还开发了组合网络安全相似性模型和通用文本相似性模型中的文本相似性模型。此模型可用于文档挖掘中,以匹配安全文本,将描述相似漏洞的文档聚类等等。

著录项

  • 作者

    Chavan, Manish Padmakar.;

  • 作者单位

    University of Maryland, Baltimore County.;

  • 授予单位 University of Maryland, Baltimore County.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2014
  • 页码 78 p.
  • 总页数 78
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:53:28

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号