首页> 外文期刊>PeerJ Computer Science >Image-based many-language programming language identification
【24h】

Image-based many-language programming language identification

机译:基于图像的多种语程编程语言识别

获取原文
获取外文期刊封面目录资料

摘要

Programming language identification (PLI) is a common need in automatic program comprehension as well as a prerequisite for deeper forms of code understanding. Image-based approaches to PLI have recently emerged and are appealing due to their applicability to code screenshots and programming video tutorials. However, they remain limited to the recognition of a small amount of programming languages (up to 10 languages in the literature). We show that it is possible to perform image-based PLI on a large number of programming languages (up to 149 in our experiments) with high (92%) precision and recall, using convolutional neural networks (CNNs) and transfer learning, starting from readily-available pretrained CNNs. Results were obtained on a large real-world dataset of 300,000 code snippets extracted from popular GitHub repositories. By scrambling specific character classes and comparing identification performances we also show that the characters that contribute the most to the visual recognizability of programming languages are symbols (e.g., punctuation, mathematical operators and parentheses), followed by alphabetic characters, with digits and indentation having a negligible impact.
机译:编程语言识别(PLI)是在自动程序理解中的常见需求以及更深入形式的代码理解的先决条件。最近出现了基于图像的PLI方法,并且由于它们对编码屏幕截图和编程视频教程的适用性而令人吸引人。但是,它们仍然限于识别少量编程语言(文献中最多10种语言)。我们表明,使用卷积神经网络(CNNS)和转移学习,可以在大量编程语言(高达149中)执行基于图像的PLI(高达149),从而开始易于使用的净化CNN。结果是在从流行的GitHub存储库中提取的300,000个代码片段的大型实际数据集上获得。通过扰乱特定字符类并进行比较识别性能,我们还表明为编程语言的视觉识别性提供贡献的字符是符号(例如,标点符号,数学运算符和括号),然后是字母字符,具有数字和缩进可以忽略不计的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号