Code Localization in Programming Screencasts

Mohammad Alahmadi; Abdulkarim Khormi; Biswas Parajuli; Jonathan Hassel; Sonia Haiduc; Piyush Kumar

首页> 外文期刊>Empirical Software Engineering >Code Localization in Programming Screencasts

【24h】

Code Localization in Programming Screencasts

机译：编程截屏中的代码本地化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Programming screencasts are growing in popularity and are often used by developers as a learning source. The source code shown in these screencasts is often not available for download or copy-pasting. Without having the code readily available, developers have to frequently pause a video to transcribe the code. This is time-consuming and reduces the effectiveness of learning from videos. Recent approaches have applied Optical Character Recognition (OCR) techniques to automatically extract source code from programming screencasts. One of their major limitations, however, is the extraction of noise such as the text information in the menu, package hierarchy, etc. due to the imprecise approximation of the code location on the screen. This leads to incorrect, unusable code. We aim to address this limitation and propose an approach to significantly improve the accuracy of code localization in programming screencasts, leading to a more precise code extraction. Our approach uses a Convolutional Neural Network to automatically predict the exact location of code in an image. We evaluated our approach on a set of frames extracted from 450 screencasts covering Java, C#, and Python programming topics. The results show that our approach is able to detect the area containing the code with 94% accuracy and that our approach significantly outperforms previous work. We also show that applying OCR on the code area identified by our approach leads to a 97% match with the ground truth on average, compared to only 31% when OCR is applied to the entire frame.

机译：编程截屏节目越来越受欢迎，并且经常被开发人员用作学习资源。这些截屏视频中显示的源代码通常无法下载或粘贴。在没有随时可用的代码的情况下，开发人员必须经常暂停视频以转录代码。这很耗时，并且降低了从视频中学习的效率。最近的方法已经应用了光学字符识别（OCR）技术来自动从编程屏幕广播中提取源代码。然而，它们的主要限制之一是由于屏幕上代码位置的不精确近似，导致噪声的提取，例如菜单中的文本信息，程序包层次结构等。这导致不正确，无法使用的代码。我们旨在解决这一局限性，并提出一种方法来显着提高编程屏幕广播中代码本地化的准确性，从而实现更精确的代码提取。我们的方法使用卷积神经网络来自动预测图像中代码的确切位置。我们从从450个涵盖Java，C＃和Python编程主题的屏幕录像中提取的一组帧中评估了我们的方法。结果表明，我们的方法能够以94％的精度检测包含代码的区域，并且我们的方法明显优于以前的工作。我们还表明，在通过我们的方法确定的代码区域上应用OCR可以平均与地面实况进行97％的匹配，而将OCR应用于整个帧时则仅为31％。

著录项

来源
《Empirical Software Engineering》 |2020年第2期|1536-1572|共37页
作者
Mohammad Alahmadi; Abdulkarim Khormi; Biswas Parajuli; Jonathan Hassel; Sonia Haiduc; Piyush Kumar;
展开▼
作者单位

Florida State University 600 W College Ave Tallahassee FL 32306 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Programming video tutorials; Software documentation; Source code; Deep learning; Video mining;

机译：编程视频教程;软件文档;源代码;深度学习;视频挖掘;

相似文献

外文文献
中文文献
专利

1. psc2code: Denoising Code Extraction from Programming Screencasts [J] . LINGFENG BAO, ZHENCHANG XING, XIN XIA, ACM transactions on software engineering and methodology . 2020,第3期

机译：PSC2Code：从编程截图中提取代码提取
2. Using screencasts to enhance coding skills: The case of logic programming? [J] . Kefalas Petros, Stamatopoulou Ioanna Computer Science and Information Systems . 2018,第3期

机译：使用截屏视频增强编码技能：逻辑编程的情况？
3. Taperin (c9orf75), a mutated gene in nonsyndromic deafness, encodes a vertebrate specific, nuclear localized protein phosphatase one alpha (PP1???±) docking protein Taperin (c9orf75), a mutated gene in nonsyndromic deafness, encodes a vertebrate specific, nuclear localized protein phosphatase one alpha (PP1???±) docking protein Taperin (c9orf75), a mutated gene in nonsyndromic deafness, encodes a vertebrate specific, nuclear localized protein phosphatase one alpha (PP1???±) docking protein [J] . Veerle De Wever, Jens Andersen, Mhairi Nimick, Biology Open . 2012,第2期

机译：Taperin（c9orf75），一种非综合征性耳聋的突变基因，编码脊椎动物特异性的核定位蛋白磷酸酶一个α（PP1 ???±）对接蛋白Taperin（c9orf75），一种非综合征性耳聋的突变基因，编码脊椎动物的特异性核蛋白局部蛋白磷酸酶一α（PP1 ???±）对接蛋白Taperin（c9orf75），一种非综合征性耳聋的突变基因，编码脊椎动物特异性的核定位蛋白磷酸酶一α（PP1 ???±）对接蛋白。
4. ActionNet: Vision-Based Workflow Action Recognition From Programming Screencasts [C] . Dehai Zhao, Zhenchang Xing, Chunyang Chen, International Conference on Software Engineering . 2019

机译：ActionNet：编程截屏视频中基于视觉的工作流动作识别
5. A Theoretical Approach to Coded Rhetorics: Examining the Rhetoricity of Programming Languages and Source Code [D] . Huff, Micah. 2020

机译：编码修辞学的理论方法：检查编程语言和源代码的reliCate
6. Nonasymptotic Upper Bounds on Binary Single Deletion Codes via Mixed Integer Linear Programming [O] . Albert No 2019

机译：通过混合整数线性规划二进制单删除代码对二元删除码的巨大上限
7. Aporte de la etnografía en el conocimiento de los códigos socioculturales de la leishmaniasis cutánea localizada en un programa de educación para la salud, en Venezuela The contribution of ethnography to knowledge on socio-cultural codes related to localized cutaneous leishmaniasis in a health education program in Venezuela [O] . Baílde García Guevara 2007

机译：人种学对委内瑞拉健康教育计划中皮肤利什曼病社会文化法典知识的贡献人种志对委内瑞拉健康教育计划中与局部皮肤利什曼病有关的社会文化法则知识的贡献

Code Localization in Programming Screencasts

摘要

著录项

相似文献

相关主题

期刊订阅