Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

Hussain Rafaqat; Gao Hui; Shaikh Riaz Ahmed

首页> 外文期刊>Multimedia Tools and Applications >Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

【24h】

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

机译：在基于文本的验证码中对连接字符进行分段以实现智能字符识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over last few years, CAPTCHAs are ubiquitously found on internet as a security mechanism to distinguish between humans and spams. The text-based CAPTCHAs offer users to recognize the distorted text from the challenged images. Having based on hard AI problem, they have emerged as a hot research topic in computer vision and machine learning. The contemporary text-based CAPTCHAs are based on the segmentation problem that involves their decomposition into sub-images of individual characters. This is a challenging task for current OCR programs which is not yet solved to a great extent. In this paper, we present a novel segmentation and recognition method which uses simple image processing techniques including thresholding, thinning and pixel count methods along with an artificial neural network for text-based CAPTCHAs. We attack the popular CCT (Crowded Characters Together) based CAPTCHAs and compare our results with other schemes. As overall, our system achieves an overall precision of 51.3, 27.1 and 53.2% for Taobao, MSN and eBay datasets with 1000,500 and 1000 CAPTCHAs respectively. The benefits of this research are twofold: by recognizing text-based CAPTCHAs, we not only explore the weaknesses in the current design but also find a way to segment and recognize the connected characters from images. The proposed algorithm can be used in digitization of ancient books, handwriting recognition and other similar tasks.

机译：在过去的几年中，CAPTCHA作为一种区分人与垃圾邮件的安全机制而在互联网上无处不在。基于文本的验证码为用户提供了从挑战图像中识别变形文本的功能。基于硬AI问题，它们已成为计算机视觉和机器学习中的热门研究主题。当代的基于文本的验证码基于分割问题，该分割问题涉及将其分解成单个字符的子图像。对于当前尚未很大程度上解决的OCR程序，这是一项艰巨的任务。在本文中，我们提出了一种新颖的分割和识别方法，该方法使用简单的图像处理技术（包括阈值化，细化和像素计数方法）以及用于基于文本的CAPTCHA的人工神经网络。我们攻击了基于CACTHA的流行CCT（一起拥挤的字符），并将我们的结果与其他方案进行了比较。总体而言，我们的系统对于淘宝，MSN和eBay数据集分别具有1000,500和1000个CAPTCHA的总体精度达到51.3％，27.1和53.2％。这项研究的好处是双重的：通过识别基于文本的验证码，我们不仅探索了当前设计中的弱点，而且找到了一种从图像中分割和识别关联字符的方法。该算法可用于古籍数字化，手写识别和其他类似任务。

著录项

来源
《Multimedia Tools and Applications》 |2017年第24期|25547-25561|共15页
作者
Hussain Rafaqat; Gao Hui; Shaikh Riaz Ahmed;
展开▼
作者单位

Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Sichuan, Peoples R China;

Shah Abdul Latif Univ, Dept Comp Sci, Khairpur 66020, Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
CAPTCHAs; Artificial Intelligence; Machine learning; Image processing; Crowding characters together; Intelligent character recognition;

机译：验证码;人工智能;机器学习;图像处理;将字符挤在一起;智能字符识别;

相似文献

外文文献
中文文献
专利

1. Breaking text-based CAPTCHAs with variable word and character orientation [J] . Starostenko Oleg, Cruz-Perez Claudia, Uceda-Ponga Fernando, Pattern Recognition: The Journal of the Pattern Recognition Society . 2015,第4期

机译：用可变的单词和字符方向打破基于文本的验证码
2. Offline Sanskirthandwritten Character Recognition Framework Based on Multi Layerfeed Forward Network with Intelligent Character Recognition [J] . R. Dinesh Kumar Asian Journal of Information Technology . 2016,第11期

机译：基于多层前馈智能字符识别的离线Sanskirt手写字符识别框架
3. Research on License Plate Image Segmentation and Intelligent Character Recognition [J] . Huang Jianping International Journal of Pattern Recognition and Artificial Intelligence . 2020,第6期

机译：牌照图像分割与智能字符识别研究
4. Recognition based segmentation of connected characters in text based CAPTCHAs [C] . Rafaqat Hussain, Hui Gao, Riaz Ahmed Shaikh, IEEE International Conference on Communication Software and Networks . 2016

机译：基于文本的验证码中连接字符的基于识别的分割
5. An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings =Ein intelligenter semi-automatischer Workflow für die OCR historischer Drucke [D] . Reul, Christian. 2020

机译：用于光学字符识别的智能半自动工作流程识别历史印刷= OCR历史印刷品的智能半自动工作流程
6. Data entry quality of double data entry vs automated form processing technologies: A cohort study validation of optical mark recognition and intelligent character recognition in a clinical setting [O] . Aksel Paulsen, Knut Harboe, Ingvild Dalen 2020

机译：双数据输入数据输入质量VS自动形式处理技术：临床环境中光学标记识别和智能字符识别的队列研究
7. BIO-INSPIRED UNIFIED MODEL OF VISUAL SEGMENTATION SYSTEM FOR CAPTCHA CHARACTER RECOGNITION [O] . Chi-wei Lin, Yu-han Chen, Liang-gee Chen 2013

机译：用于CapTCHa字符识别的可视化分割系统的生物启发统一模型

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

摘要

著录项

相似文献

相关主题

期刊订阅