首页> 外文学位 >Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity.

【24h】

Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity.

机译：开发网络安全文本语料库及其在增强语义文本相似度中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The growing use of cyber-services automatically impart great importance to cybersecurity. The Internet is a primary source of information regarding software flaws, vulnerabilities, cyber-attacks and exploits. This information is available through vulnerability databases, news articles, security bulletins and blogs. Variety of applications and security systems like Intrusion Detection Systems (IDS), Intrusion Prevention System (IPS), etc. can take advantage of this information for consolidating their infrastructure. The lack of availability of ready text corpus of high quality security information from various sources makes it difficult for these applications to use this information. To overcome this problem our work focuses on building a multi-genre corpus of security text using information retrieved from multiple internet based sources; National Vulnerabilities Database, Wikipedia articles, security blogs, security bulletins and scholarly papers. The system builds a text classifier from the initial high quality data which is used to classify and accommodate new data from these sources into the corpus.;This corpus can be used by variety of applications like IDS or IPS, in variety of ways like assertion into knowledge base or extraction of named entities. Our work explores one of the applications of generating the semantic text similarity model for cybersecurity text. We use the multi-genre cybersecurity text corpus for creating the word co-occurrence model. This model can extract the synonymity between the different security terms. For example, the words ' virus' and 'malware' that have same context are scored for their degree of similarity. The word co-occurrence model is then extended to generate a semantic text similarity model.The text similarity model extracts the semantic text similarity between different security texts like titles of the papers, vulnerability descriptions, blog paragraphs, etc. The system also develops a combined text similarity model from cybersecurity similarity model and generic text similarity model. This model can be used in document mining for matching security text, clustering documents describing similar vulnerabilities and so on.

机译：网络服务的日益普及自动地将网络安全性赋予了极大的重要性。互联网是有关软件缺陷，漏洞，网络攻击和利用的主要信息来源。可通过漏洞数据库，新闻文章，安全公告和博客获得此信息。各种应用程序和安全系统，例如入侵检测系统（IDS），入侵防御系统（IPS）等，都可以利用此信息来巩固其基础架构。缺乏来自各种来源的高质量安全信息的现成文本语料库，使得这些应用程序难以使用此信息。为了克服这个问题，我们的工作重点是使用从多个基于Internet的来源中检索到的信息来构建多类型的安全文本语料库；国家漏洞数据库，维基百科文章，安全博客，安全公告和学术论文。系统从最初的高质量数据构建文本分类器，该文本分类器用于将来自这些来源的新数据分类并将其容纳到语料库中;该语料库可以以各种方式（例如断言）用于IDS或IPS等应用程序知识库或命名实体的提取。我们的工作探索了为网络安全文本生成语义文本相似性模型的应用之一。我们使用多类型网络安全文本语料库来创建单词共现模型。该模型可以提取不同安全性术语之间的同义词。例如，对具有相同上下文的单词“病毒”和“恶意软件”的相似程度进行评分。然后扩展单词共现模型以生成语义文本相似性模型。文本相似性模型提取不同安全文本之间的语义文本相似性，例如论文标题，漏洞描述，博客段落等。系统还开发了组合网络安全相似性模型和通用文本相似性模型中的文本相似性模型。此模型可用于文档挖掘中，以匹配安全文本，将描述相似漏洞的文档聚类等等。

著录项

作者
Chavan, Manish Padmakar.;
展开▼
作者单位

University of Maryland, Baltimore County.;

展开▼
授予单位 University of Maryland, Baltimore County.;
学科 Computer Science.
学位 M.S.
年度 2014
页码 78 p.
总页数 78
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:53:28

相似文献

外文文献
中文文献
专利

1. A French clinical corpus with comprehensive semantic annotations: development of the Medical Entity and Relation LIMSI annotated Text corpus (MERLOT) [J] . Campillos Leonardo, Deleger Louise, Grouin Cyril, Language Resources and Evaluation . 2018,第2期

机译：具有全面语义注释的法语临床语料库：医学实体和关系LIMSI注释文本语料库（MERLOT）的开发
2. Building semantically annotated corpus for text classification of Indian defence news articles [J] . aurabh A. Kanekar, Alind Sharma, Gaurang S. Patkar, International Journal of Information Technology . 2021,第4期

机译：建立语义注释的印度国防新闻文本分类语料库
3. A Semantic Framework for Extracting Taxonomic Relations from Text Corpus [J] . Phuoc Thi Hong Doan, Arch-int Ngamnij, Arch-int Somjit The international arab journal of information technology . 2020,第3期

机译：从文本语料库中提取分类学关系的语义框架
4. Ontology Learning from Text Using Automatic Ontological-Semantic Text Annotation and the Web as the Corpus [C] . Jesse English, Sergei Nirenburg AAAI Symposium on Machine Reading . 2007

机译：从文本使用自动本体语义文本注释和Web作为语料库的文本学习本体学习
5. Semantic preserving text representation and its applications in text clustering. [D] . Howard, Michael. 2012

机译：语义保留文本表示及其在文本聚类中的应用。
6. Interoperability of text corpus annotations with the semantic web [O] . Karin Verspoor, Jin-Dong Kim, Michel Dumontier 2015

机译：文本语料库注释与语义网的互操作性
7. Enriching Augmented Reality with Text Data Mining: An Automated Content Management System to Develop Hybrid Media Applications [O] . Raso Rocco, Werth Dirk, Loos Peter 2017

机译：利用文本数据挖掘丰富增强现实：自动化内容管理系统，用于开发混合媒体应用程序
8. Attitudinal Modeling of Affect, Behavior and Cognition: Semantic Mining of Disaster Text Corpus [R] . Khalid, H. M., Radha, J. K., Helander, M. G., 2010

机译：影响，行为和认知的态度建模：灾难文本语料库的语义挖掘

Developing a Cybersecurity Text Corpus and its Application for Augmenting Semantic Text Similarity.

摘要

著录项

相似文献

相关主题

期刊订阅