首页> 外文会议>International Conference on Machine Learning for Cyber Security >Software Entity Recognition Method Based on BERT Embedding

【24h】

Software Entity Recognition Method Based on BERT Embedding

机译：基于BERT嵌入的软件实体识别方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The global open source software ecosystem contains rich information in the field of software engineering. The existing analysis methods for the text content of the knowledge community in this field are mainly focus on the structural relationship and rule-based association and mining. This paper proposes a software entity recognition method based on BERT word embedding. Firstly, the BiLSTM-CRF model is constructed, and the entity recognition model is constructed by combining the word vector embedding in software engineering field. Then, the word vector in the input layer of the model is improved by introducing the BERT pre-training language model. In the process of pre-training of BERT, the pre-training data should be constructed based on the discussion content of Stack Overflow software Q & A community. Then, we use these data to pre-training the BERT model, so as to obtain the word vector representation suitable for software engineering field, improving the effect of entity recognition in software engineering field, and solving the problem that the traditional word vector embedding is mostly based on the general domain data training, which is not fully suitable for software engineering field, and can't well represent the context semantic information. At the same time, to solve the problem that there are few annotated data in the field of software, this paper tries to extends the data appropriately by the method of model prediction and dictionary matching, and carries out experimental test. Finally, this paper uses the method of deep learning to realize the entity recognition in the field of software engineering, so as to provide support for the extraction of software entities, the construction of software knowledge base, and the intelligent application of software engineering.

机译：全球开源软件生态系统包含在软件工程领域的丰富信息。本领域知识社区文本内容的现有分析方法主要集中在结构关系和基于规则的关联和挖掘。本文提出了一种基于BERT Word嵌入的软件实体识别方法。首先，构造Bilstm-CRF模型，并且通过组合软件工程字段中的单词矢量来构建实体识别模型。然后，通过引入BERT预训练语言模型来改进模型的输入层中的单词向量。在伯特预训练的过程中，应根据堆栈溢出软件Q＆A社区的讨论内容构建预训练数据。然后，我们使用这些数据来预先训练BERT模型，以便获得适合软件工程领域的单词矢量表示，提高实体识别在软件工程领域的影响，并解决传统文字媒体嵌入的问题主要基于普通域数据培训，这不完全适合软件工程字段，并且不能很好地代表上下文语义信息。与此同时，为了解决软件领域的注释数据很少的问题，本文试图通过模型预测和字典匹配的方法适当地扩展数据，并进行实验测试。最后，本文使用深度学习方法来实现软件工程领域的实体识别，以便为软件实体的提取，软件知识库的构建提供支持，以及软件工程的智能应用。

著录项

来源
《International Conference on Machine Learning for Cyber Security 》|2020年|33-47|共15页
会议地点
作者
Chao Sun; Mingjing Tang; Li Liang; Wei Zou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Entity recognition; BERT model; Stack overflow;

机译：实体识别;伯特模型;堆栈溢出;

相似文献

外文文献
中文文献
专利

1. Named entity recognition method in health preserving field based on BERT [J] . Qiang Zhang, Yong Sun, Linlin Zhang, Procedia Computer Science . 2021 ,第1期

机译：基于BERT的健康保存场命名实体识别方法
2. BE-BLC: BERT-ELMO-Based Deep Neural Network Architecture for English Named Entity Recognition Task [J] . Manel Affi, Chiraz Latiri Procedia Computer Science . 2021 ,第a期

机译：BE-BLC：基于BERT-ELMO的英语深度神经网络架构，用于英语命名实体识别任务
3. Chinese named entity recognition model based on BERT [J] . Hongshuai Liu, Ge Jun, Yuanyuan Zheng MATEC Web of Conferences . 2021 ,第a期

机译：基于伯特的中国名称实体识别模型
4. BERT-based Named Entity Recognition Method for Chinese Recipe Text [C] . Wei Fulun, Zhu Yonghua International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering . 2021

机译：基于BERT的中文配方文本的名为实体识别方法
5. Pattern recognition software development methodology (PRSDM) based on design pattern recognition techniques and agile methodologies. [D] . Darwiesh, Moeen. 2008

机译：基于设计模式识别技术和敏捷方法的模式识别软件开发方法（PRSDM）。
6. Korean clinical entity recognition from diagnosis text using BERT [O] . Young-Min Kim, Tae-Hoon Lee 2020

机译：韩国临床实体识别伯特诊断文本
7. Chinese named entity recognition model based on BERT [O] . Hongshuai Liu, Ge Jun, Yuanyuan Zheng 2021

机译：基于伯特的中国名称实体识别模型
8. Comprehensive Security Analysis of and an Implementation Framework for Embedded Software Attestation Methods Leveraging FPGA-Based System-on-a-Chip Architectures. [R] . Reber, P. A. 2017

机译：利用基于FpGa的片上系统架构的嵌入式软件认证方法的综合安全性分析和实现框架。

Software Entity Recognition Method Based on BERT Embedding

摘要

著录项

相似文献

相关主题

期刊订阅