A Deep Learning Approach to Identifying Source Code in Images and Video

机译：识别图像和视频中源代码的深度学习方法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools.

机译：尽管在Internet规模上挖掘代码方面已经取得了实质性进展，但迄今为止，绝大多数工作都集中在以原始代码本地表示为文本的数据集上。在线上可用的大量源代码以及嵌入技术视频中的大量源代码仍未开发，部分原因是用图像表示代码时提取的复杂性。在这种环境下，现有的代码提取和索引方法在很大程度上依赖于计算强度大的光学字符识别。为了提高识别此嵌入式代码以及识别类似代码示例的简便性和效率，我们开发了基于卷积神经网络和自动编码器的深度学习解决方案。专注于Java进行概念验证，我们的技术能够根据通过深度架构学习的语法和上下文特征，以85.6％-98.6％的准确性识别数千个视频图像中的排版和手写源代码。当与传统方法结合时，这为视频索引提供了更可扩展的基础，可以将其纳入现有的软件搜索和挖掘工具中。

著录项

来源
《IEEE/ACM International Conference on Mining Software Repositories》|2018年|376-386|共11页
会议地点 Gothenburg(SE)
作者
Jordan Ott; Abigail Atchison; Paul Harnack; Adrienne Bergh; Erik Linstead;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Data mining; Optical character recognition software; Convolutional neural networks; Tutorials;

机译：深度学习；数据挖掘;光学字符识别软件；卷积神经网络讲解;

相似文献

外文文献
中文文献
专利

1. Feedback2Code: A Deep Learning Approach to Identifying User-Feedback-Related Source Code Files [J] . Shuhan Yan, Tianjiao Du, Beijun Shen, International journal of software engineering and knowledge engineering . 2020,第1期

机译：Feedback2Code：一种深度学习方法，用于识别与用户反馈相关的源代码文件
2. SpikeSegNet-a deep learning approach utilizing encoder-decoder network with hourglass for spike segmentation and counting in wheat plant from visual imaging [J] . Tanuj Misra, Alka Arora, Sudeep Marwaha, Plant methods . 2020,第1期

机译：SpikEsegnet - 利用编码器 - 解码器网络与沙漏用于尖峰分割和在视觉成像中计数的深度学习方法
3. A System For Identifying Synthetic Images Using Lstm: A Deep Learning Approach [J] . Hemanth Somasekar, Dr. Kavya Naveen International Journal of Computer Trends and Technology . 2021,第2期

机译：使用LSTM识别合成图像的系统：深入学习方法
4. A Deep Learning Approach to Identifying Source Code in Images and Video [C] . Jordan Ott, Abigail Atchison, Paul Harnack, IEEE/ACM International Conference on Mining Software Repositories . 2018

机译：识别图像和视频源代码的深度学习方法
5. A Deep Learning Approach for Identifying Key Biomarkers in Medical Imaging Applications [D] . Odaibo, David . 2019

机译：识别医学成像应用中键生物标志物的深度学习方法
6. SpikeSegNet-a deep learning approach utilizing encoder-decoder network with hourglass for spike segmentation and counting in wheat plant from visual imaging [O] . Tanuj Misra, Alka Arora, Sudeep Marwaha, 2020

机译：SpikeSegNet-一种深度学习方法利用带有沙漏的编码器-解码器网络对小麦植株中的穗进行分割并通过视觉成像进行计数
7. SpikeSegNet-a deep learning approach utilizing encoder-decoder network with hourglass for spike segmentation and counting in wheat plant from visual imaging [O] . Tanuj Misra, Alka Arora, Sudeep Marwaha, 2020

机译：SpikEsegnet - 利用编码器 - 解码器网络与沙漏用于尖峰分割和在视觉成像中计数的深度学习方法

A Deep Learning Approach to Identifying Source Code in Images and Video

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅