首页> 外文会议>IEEE/ACM International Conference on Mining Software Repositories >A Deep Learning Approach to Identifying Source Code in Images and Video
【24h】

A Deep Learning Approach to Identifying Source Code in Images and Video

机译:识别图像和视频中源代码的深度学习方法

获取原文

摘要

While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools.
机译:尽管在Internet规模上挖掘代码方面已经取得了实质性进展,但迄今为止,绝大多数工作都集中在以原始代码本地表示为文本的数据集上。在线上可用的大量源代码以及嵌入技术视频中的大量源代码仍未开发,部分原因是用图像表示代码时提取的复杂性。在这种环境下,现有的代码提取和索引方法在很大程度上依赖于计算强度大的光学字符识别。为了提高识别此嵌入式代码以及识别类似代码示例的简便性和效率,我们开发了基于卷积神经网络和自动编码器的深度学习解决方案。专注于Java进行概念验证,我们的技术能够根据通过深度架构学习的语法和上下文特征,以85.6%-98.6%的准确性识别数千个视频图像中的排版和手写源代码。当与传统方法结合时,这为视频索引提供了更可扩展的基础,可以将其纳入现有的软件搜索和挖掘工具中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号