首页> 外文会议>IEEE International Conference on Software Maintenance >Vocabulary normalization improves IR-based concept location
【24h】

Vocabulary normalization improves IR-based concept location

机译:词汇标准化改善了基于IR的概念位置

获取原文

摘要

Tool support is crucial to modern software development, evolution, and maintenance. Early tools reused the static analysis performed by the compiler. These were followed by dynamic analysis tools and more recently tools that exploit natural language. This later class has the advantage that it can incorporate not only the code, but artifacts from all phases of software construction and its subsequent evolution. Unfortunately, the natural language found in source code often uses a vocabulary different from that used in other software artifacts and thus increases the vocabulary mismatch problem. This problem exists because many natural-language tools imported from Information Retrieval (IR) and Natural Language Processing (NLP) implicitly assume the use of a single natural language vocabulary. Vocabulary normalization, which goes well beyond simple identifier splitting, brings the vocabulary of the source into line with other artifacts. Consequently, it is expected to improve the performance of existing and future IR and NLP based tools. As a case study, an experiment with an LSI-based feature locator is replicated. Normalization universally improves performance. For the tersest queries, this improvement is over 180% (p < 0.0001).
机译:工具支持对于现代软件开发,进化和维护至关重要。早期工具重用编译器执行的静态分析。这些跟着动态分析工具以及利用自然语言的最近工具。这个稍后的课程具有以下优点,即它不仅可以包含代码,而是从软件构造的所有阶段和随后的演进中的术语。不幸的是,源代码中的自然语言通常使用与其他软件工件中使用的词汇不同的词汇,从而增加了词汇错配问题。存在此问题,因为从信息检索(IR)和自然语言处理(NLP)导入的许多自然语言工具隐含地假设使用单个自然语言词汇表。词汇标准化远远超出简单标识符拆分,将源的词汇与其他工件带入线路。因此,预计会改善现有和未来的IR和基于NLP的工具的性能。作为案例研究,复制了具有LSI的特征定位器的实验。归一化普遍提高性能。对于Tersest查询,这种改进超过180%(P <0.0001)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号