首页> 外文会议>IEEE Annual Computer Software and Applications Conference >The Expansion of Source Code Abbreviations Using a Language Model
【24h】

The Expansion of Source Code Abbreviations Using a Language Model

机译:使用语言模型扩展源代码缩写

获取原文

摘要

Programmers often abbreviate identifiers names in source code to represent single words, i.e. unigrams, or phrases, i.e. multigrams. However, the difficulty to retrieve the original word(s) of an abbreviation during the maintenance phase makes the source code more problematic to comprehend. Incorrect abbreviations expansion may lead to introducing defects in the code. There are many approaches that that automatically expand abbreviations to their original words, unfortunately, they are based on predefined patterns and single-words dictionaries which cannot address abbreviations that are expandable to phrases. In this paper, we describe a bigram-based inference model which utilizes unigrams statistical properties as evidence to retrieve the original word automatically. We evaluated our approach on a set of 100 abbreviations randomly picked from eight open source projects and found that our approach correctly expands 78% of the set.
机译:程序员通常在源代码中缩写标识符名称,以表示单个单词(即,单字组)或短语(即,多字组)。但是,在维护阶段很难检索缩写词的原始单词,这使源代码更难以理解。不正确的缩写词扩展可能会导致在代码中引入缺陷。有很多方法可以将缩写词自动扩展到其原始单词,但是不幸的是,它们基于预定义的模式和单个单词词典,无法解决可扩展到短语的缩写词。在本文中,我们描述了一个基于双链推理的推理模型,该模型利用单字组统计属性作为证据来自动检索原始单词。我们对从八个开源项目中随机选择的100个缩写词进行了评估,发现我们的方法正确地扩展了该集合的78%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号