Expanding identifiers to normalize source code vocabulary

机译：扩展标识符以标准化源代码词汇表

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Maintaining modern software requires significant tool support. Effective tools exploit a variety of information and techniques to aid a software maintainer. One area of recent interest in tool development exploits the natural language information found in source code. Such Information Retrieval (IR) based tools compliment traditional static analysis tools and have tackled problems, such as feature location, that otherwise require considerable human effort. To reap the full benefit of IR-based techniques, the language used across all software artifacts (e.g., requirements, design, change requests, tests, and source code) must be consistent. Unfortunately, there is a significant proportion of invented vocabulary in source code. Vocabulary normalization aligns the vocabulary found in the source code with that found in other software artifacts. Most existing work related to normalization has focused on splitting an identifier into its constituent parts. The next step is to expand each part into a (dictionary) word that matches the vocabulary used in other software artifacts. Building on a successful approach to splitting identifiers, an implementation of an expansion algorithm is presented. Experiments on two systems find that up to 66% of identifiers are correctly expanded, which is within about 20% of the current system's best-case performance. Not only is this performance comparable to previous techniques, but the result is achieved in the absence of special purpose rules and not limited to restricted syntactic contexts. Results from these experiments also show the impact that varying levels of documentation (including both internal documentation such as the requirements and design, and external, or user-level, documentation) have on the algorithm's performance.

机译：维护现代软件需要强大的工具支持。有效的工具利用各种信息和技术来辅助软件维护人员。最近在工具开发中感兴趣的一个领域是利用源代码中的自然语言信息。这种基于信息检索（IR）的工具补充了传统的静态分析工具，并解决了诸如特征定位之类的问题，这些问题否则需要大量的人力。为了充分利用基于IR的技术，在所有软件工件（例如，需求，设计，变更请求，测试和源代码）中使用的语言必须保持一致。不幸的是，源代码中有很大比例的发明词汇。词汇规范化将源代码中的词汇与其他软件工件中的词汇对齐。现有的与规范化有关的大多数工作都集中在将标识符分成其组成部分上。下一步是将每个部分扩展为一个与其他软件工件中使用的词汇匹配的（词典）单词。以成功的标识符分割方法为基础，提出了扩展算法的实现。在两个系统上进行的实验发现，正确地扩展了多达66％的标识符，这大约是当前系统最佳情况下性能的20％。这种性能不仅可以与以前的技术相提并论，而且可以在没有特殊目的规则的情况下获得结果，并且不限于受限的语法环境。这些实验的结果还表明，不同级别的文档（包括内部文档（例如需求和设计）以及外部或用户级别的文档）对算法性能的影响。

著录项

来源
《2011 27th IEEE International Conference on Software Maintenance》|2011年|p.113-122|共10页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件;
关键词
natural language processing; program comprehension; source code analysis tools;

机译：自然语言处理;程序理解;源代码分析工具;

相似文献

外文文献
中文文献
专利

1. Identifying self-admitted technical debt through code comment analysis with a contextualized vocabulary [J] . Information and software technology . 2020,第May期

机译：通过使用上下文化词汇表的代码注释分析来识别自行承担的技术债务
2. Normalized Rare Earth Elements in Water, Sediments, and Wine: Identifying Sources and Environmental Redox Conditions [J] . David Z. Piper, Michael Bau American Journal of Analytical Chemistry . 2013,第10期

机译：水，沉积物和葡萄酒中的标准化稀土元素：确定来源和环境氧化还原条件
3. Validity of ICD-9 and ICD-10 codes used to identify acute liver injury: A study in three European data sources [J] . Forns Joan, Cainzos-Achirica Miguel, Hellfritzsch Maja, Pharmacoepidemiology and drug safety . 2019,第7期

机译：ICD-9和ICD-10代码的有效性用于鉴定急性肝损伤：三个欧洲数据来源的研究
4. Expanding identifiers to normalize source code vocabulary [C] . (missing) IEEE International Conference on Software Maintenance . 2011

机译：扩展标识符以正常化源代码词汇表
5. A Probabilistic-Based Approach for Expanding Abbreviations in Source Code [D] . Alatawi, Abdulrahman M. 2018

机译：基于概率的源代码扩展缩写方法
6. Extracting medical knowledge for a coded problem list vocabulary from the UMLS Knowledge Sources. [O] . J. W. Hales, K. M. Schoeffler, D. P. Kessler 1998

机译：从UMLS知识源中提取医学知识用于编码的问题列表词汇。
7. Development of the Logical Observation Identifier Names and Codes (LOINC) Vocabulary [O] . Huff, Stanley M., Rocha, Roberto A., McDonald, Clement J., 1998

机译：逻辑观察标识符名称和代码（LOINC）的开发词汇

Expanding identifiers to normalize source code vocabulary

摘要

著录项

相似文献

相关主题

期刊订阅