首页> 外文会议>International Joint Conference on Neural Networks >Hierarchical Embedding for Code Search in Software QA Sites
【24h】

Hierarchical Embedding for Code Search in Software QA Sites

机译:在软件问答网站中进行代码搜索的分层嵌入

获取原文

摘要

In recent years, code search techniques on software Q&A sites have become increasingly attractive due to the need for software development. Most of the existing work treats code snippets as text fragments, ignoring the effect of the structured information (i.e. sequential information) of the code. Meanwhile, much of the existing work does not take into account the interactive between code snippets and queries.In this paper, we propose a novel deep neural network named HECS 1 (Hierarchical embedding for code search) to solve the problems mentioned above. Our method divides the embedding process of code and query into two hierarchies, that is, the potential information is captured by two modules (the Intra-language encoding module and the Cross-language encoding module). In particular, our approach uses special LSTM (Long Short-Term Memory) variants, which is ON-LSTM (ordered neurons LSTM) to capture the keyword order structure of the code. The Intra-language encoding module is implemented by the LSTM variant and the Cross-language encoding module is an interactive information calculation module implemented by the attention mechanism. In this way, the similarity between the query and the corresponding code snippets in the vector space could be better captured. HECS can understand the difference between positive and negative samples more accurately.We empirically evaluate HECS, using a large scale codebase collected from StackOverflow. The experimental results show that our approach achieves state-of-the-art performance.
机译:近年来,由于对软件开发的需求,在软件问答站点上的代码搜索技术变得越来越有吸引力。现有的大多数工作都将代码片段视为文本片段,而忽略了代码的结构化信息(即顺序信息)的影响。同时,现有的许多工作都没有考虑代码段和查询之间的交互作用。本文提出了一种名为HECS的新型深度神经网络。 1 (用于代码搜索的分层嵌入)来解决上述问题。我们的方法将代码和查询的嵌入过程分为两个层次,即,潜在信息由两个模块(语言内编码模块和跨语言编码模块)捕获。特别是,我们的方法使用特殊的LSTM(长期短期记忆)变体,即ON-LSTM(有序神经元LSTM)来捕获代码的关键字顺序结构。内语言编码模块由LSTM变体实现,而跨语言编码模块是由注意力机制实现的交互式信息计算模块。这样,可以更好地捕获向量空间中查询与相应代码段之间的相似性。 HECS可以更准确地了解正样本和负样本之间的差异。我们使用从StackOverflow收集的大规模代码库对HECS进行经验评估。实验结果表明,我们的方法达到了最先进的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号