Hierarchical Embedding for Code Search in Software QA Sites

机译：在软件问答网站中进行代码搜索的分层嵌入

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, code search techniques on software Q&A sites have become increasingly attractive due to the need for software development. Most of the existing work treats code snippets as text fragments, ignoring the effect of the structured information (i.e. sequential information) of the code. Meanwhile, much of the existing work does not take into account the interactive between code snippets and queries.In this paper, we propose a novel deep neural network named HECS ¹ (Hierarchical embedding for code search) to solve the problems mentioned above. Our method divides the embedding process of code and query into two hierarchies, that is, the potential information is captured by two modules (the Intra-language encoding module and the Cross-language encoding module). In particular, our approach uses special LSTM (Long Short-Term Memory) variants, which is ON-LSTM (ordered neurons LSTM) to capture the keyword order structure of the code. The Intra-language encoding module is implemented by the LSTM variant and the Cross-language encoding module is an interactive information calculation module implemented by the attention mechanism. In this way, the similarity between the query and the corresponding code snippets in the vector space could be better captured. HECS can understand the difference between positive and negative samples more accurately.We empirically evaluate HECS, using a large scale codebase collected from StackOverflow. The experimental results show that our approach achieves state-of-the-art performance.

机译：近年来，由于对软件开发的需求，在软件问答站点上的代码搜索技术变得越来越有吸引力。现有的大多数工作都将代码片段视为文本片段，而忽略了代码的结构化信息（即顺序信息）的影响。同时，现有的许多工作都没有考虑代码段和查询之间的交互作用。本文提出了一种名为HECS的新型深度神经网络。 ^{1
（用于代码搜索的分层嵌入）来解决上述问题。我们的方法将代码和查询的嵌入过程分为两个层次，即，潜在信息由两个模块（语言内编码模块和跨语言编码模块）捕获。特别是，我们的方法使用特殊的LSTM（长期短期记忆）变体，即ON-LSTM（有序神经元LSTM）来捕获代码的关键字顺序结构。内语言编码模块由LSTM变体实现，而跨语言编码模块是由注意力机制实现的交互式信息计算模块。这样，可以更好地捕获向量空间中查询与相应代码段之间的相似性。 HECS可以更准确地了解正样本和负样本之间的差异。我们使用从StackOverflow收集的大规模代码库对HECS进行经验评估。实验结果表明，我们的方法达到了最先进的性能。}

著录项

来源
《International Joint Conference on Neural Networks》|2020年|1-10|共10页
会议地点
作者
Ruitong Li; Gang Hu; Min Peng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Encoding; Software; Logic gates; Search problems; Biological neural networks;

机译：编码;软件;逻辑门;搜索问题;生物神经网络;

相似文献

外文文献
中文文献
专利

1. Neural joint attention code search over structure embeddings for software Q&A sites [J] . Gang Hu, Min Peng, Yihan Zhang, The Journal of Systems and Software . 2020,第Deca期

机译：神经关注代码搜索结构嵌入式软件Q＆A站点
2. Isomorphism between Linear Codes and Arithmetic Codes for Safe Data Processing in Embedded Software Systems [J] . Raab, Peter, Kr?mer, Computing and informatics . 2015,第4期

机译：嵌入式软件系统中线性代码和算术代码之间的同构，用于安全的数据处理
3. ISOMORPHISM BETWEEN LINEAR CODES AND ARITHMETIC CODES FOR SAFE DATA PROCESSING IN EMBEDDED SOFTWARE SYSTEMS [J] . Peter Raab, Stefan Kraemer, Juergen Mottok, Computing and informatics . 2014,第4期

机译：嵌入式软件系统中用于安全数据处理的线性编码和算术编码之间的同构
4. Gender differences in Music Search Behaviour on Social QA Sites: A Case Study on Zhihu [C] . Shengli Deng, Anqi zhao, Ruhua Huang Annual Meeting of the Association for Information Science and Technology . 2019

机译：社会问答网站上音乐搜索行为的性别差异：志湖的案例研究
5. Software Assists to On-chip Memory Hierarchy of Manycore Embedded Systems [D] . Shoushtari, Abdolmajid Namaki. 2018

机译：该软件有助于Manycore嵌入式系统的片上存储器层次结构
6. Single-atomic cobalt sites embedded in hierarchically ordered porous nitrogen-doped carbon as a superior bifunctional electrocatalyst [O] . Tingting Sun, Shu Zhao, Wenxing Chen, 2018

机译：嵌入有序排列的多孔氮掺杂碳中的单原子钴位点作为高级双功能电催化剂
7. EP-1177: Designing, coding and implementing a software solution for daily output QA using an Electronic Portal Imaging Device [O] . Buhl S., Andersson P., Bjelkengren U. 2013

机译：EP-1177：使用电子门禁成像设备设计，编码和实施用于每日输出质量检查的软件解决方案

Hierarchical Embedding for Code Search in Software QA Sites

摘要

著录项

相似文献

相关主题

期刊订阅