【24h】

CoSQA: 20,000+ Web Queries for Code Search and Question Answering

机译:COSQA:20,000多个Web查询,用于代码搜索和问题应答

获取原文

摘要

Finding codes given natural language query is beneficial to the productivity of software developers. Future progress towards better semantic matching between query and code requires richer supervised training resources. To remedy this, we introduce the CoSQA dataset. It includes 20,604 labels for pairs of natural language queries and codes, each annotated by at least 3 human annotators. We further introduce a contrastive learning method dubbed Co-CLR to enhance query-code matching, which works as a data augmenter to bring more artificially generated training instances. We show that evaluated on CodeXGLUE with the same CodeBERT model, training on CoSQA improves the accuracy of code question answering by 5.1%. and incorporating CoCLR brings a further improvement of 10.5%.
机译:鉴于自然语言查询的查找代码有利于软件开发人员的生产力。 查询和代码之间更好的语义匹配的未来进展需要更丰富的监督培训资源。 要解决此问题,我们介绍了Cosqa数据集。 它包括20,604个标签,用于对自然语言查询和代码成对,每个标号由至少3个人的注释器注释。 我们进一步引入了一个对比的学习方法,被称为CO-CLR,以增强查询代码匹配,它用作数据增强器,以带来更多人为生成的培训实例。 我们展示了在CodexGlue上评估了同一码伯特模型,CosQA培训提高了代码问题的准确性5.1%。 并掺入COCLR的进一步提高了10.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号