Multi-label Classification of Commit Messages using Transfer Learning

机译：使用转移学习的提交消息的多标签分类

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Commit messages are used in the industry by developers to annotate changes made to the code. Accurate classification of these messages can help monitor the software evolution process and enable better tracking for various industrial stakeholders. In this paper, we present a state of the art method for commit message classification into categories as per Swanson’s maintenance activities i.e. “Corrective”, “Perfective”, and “Adaptive”. This is a challenging task because not all commit messages are well written and informative. Existing approaches rely on keyword-based techniques to solve this problem. However, these approaches are oblivious to the full language model and do not recognize the contextual relationship between words. State of the art methodology in Natural Language Processing (NLP), is to train a context-aware neural network (Transformer) on a very large data set that encompasses the entire language and then fine-tunes it for a specific task. In this way, the model can learn the language, pay attention to the context, and then transfer that knowledge for better performance at the specific task. We use an off-the-shelf neural network called DistilBERT and fine-tune it for commit message classification task. This step is non-trivial because programming languages and commit messages have unique keywords, jargon, and idioms. This paper presents our effort in training this model and constructing the data set for this task. We describe the rules used to construct the data set. We validate our approach on industrial projects from GitHub, such as Kubernetes, Linux, TensorFlow, Spark, TypeScript, and PyTorch. We were able to achieve 87% F1-score for the commit message classification task, which is an order of magnitude accurate than previous studies.

机译：提交消息在业内通过开发人员的代码做注释的变化中。这些消息的准确分类可以帮助监控软件演化过程和启用各种工业利益相关者更好的跟踪。在本文中，我们提出的技术方法，用于提交信息分类成类别按照Swanson的的维护活动即“纠正”的状态，“完成式”和“自适应”。这是一项艰巨的任务，因为不是所有提交的信息都写得很好，内容翔实。现有方案依赖于基于关键字的方法来解决这个问题。然而，这些方法都浑然不觉完整的语言模型和不认识的单词之间的上下文关系。在自然语言处理（NLP）的技术方法的国家，是就涵盖了整个语言，然后微调其特定任务一个非常大的数据集训练情景感知神经网络（变压器）。通过这种方式，该模型可以学习语言，上下文讲究，然后在特定的任务转移的知识有更好的表现。我们使用一个被DistilBERT关闭的，现成的神经网络和微调它提交信息分类任务。这一步是不平凡的，因为编程语言和提交信息有唯一关键字，行话和成语。本文介绍了我们在训练这个模型，构建数据集用于这个任务的努力。我们描述了用于构建数据集的规则。我们确认我们对从GitHub工业项目，如Kubernetes，Linux和TensorFlow，星火，打字稿及PyTorch方法。我们能够实现87％的F1-比分为提交信息分类任务，这是数量级比以前的研究准确的顺序。

著录项

来源
《IEEE International Symposium on Software Reliability Engineering Workshops》|2020年|37-42|共6页
会议地点
作者
Muhammad Usman Sarwar; Sarim Zafar; Mohamed Wiem Mkaouer; Gursimran Singh Walia; Muhammad Zubair Malik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Bit error rate; Software development management; Context modeling; Data models; Natural language processing; Maintenance engineering;

机译：任务分析;误码率;软件开发管理;上下文建模;数据模型;自然语言处理;维护工程;

相似文献

外文文献
中文文献
专利

1. Single- and multi-label classification of construction objects using deep transfer learning methods [J] . Nipun D. Nath, Theodora Chaspari, Amir H. Behzadan Electronic Journal of Information Technology in Construction . 2019,第6期

机译：使用深度传输学习方法的施工对象的单标和多标签分类
2. Multi-label learning based deep transfer neural network for facial attribute classification [J] . Zhuang Ni, Yan Yan, Chen Si, Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：基于多标签学习的面部属性分类深度传输神经网络
3. Extreme multi-label learning: A large scale classification approach in machine learning [J] . Purvi Prajapati Journal of information and optimization sciences . 2019,第4期

机译：极端的多标签学习：机器学习中的大规模分类方法
4. Multi-label Bird Species Classification Using Transfer Learning [C] . Rajeev Rajan, Noumida A International Conference on Communication, Control and Information Sciences . 2021

机译：多标签鸟类使用转移学习分类
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Towards multi-label classification: Next step of machine learning for microbiome research [O] . Shunyao Wu, Yuzhu Chen, Zhiruo Li, 2021

机译：朝多标签分类：微生物组研究机器学习的下一步
7. Improving Multi-label Emotion Classification via Sentiment Classification with Dual Attention Transfer Network [O] . Jianfei Yu, Luís Marujo, Jing Jiang, 2018

机译：通过具有双重关注传输网络的情感分类改善多标签情感分类

Multi-label Classification of Commit Messages using Transfer Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅