首页> 外文期刊>Multimedia Tools and Applications >Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition
【24h】

Segmentation of connected characters in text-based CAPTCHAs for intelligent character recognition

机译:在基于文本的验证码中对连接字符进行分段以实现智能字符识别

获取原文
获取原文并翻译 | 示例
       

摘要

Over last few years, CAPTCHAs are ubiquitously found on internet as a security mechanism to distinguish between humans and spams. The text-based CAPTCHAs offer users to recognize the distorted text from the challenged images. Having based on hard AI problem, they have emerged as a hot research topic in computer vision and machine learning. The contemporary text-based CAPTCHAs are based on the segmentation problem that involves their decomposition into sub-images of individual characters. This is a challenging task for current OCR programs which is not yet solved to a great extent. In this paper, we present a novel segmentation and recognition method which uses simple image processing techniques including thresholding, thinning and pixel count methods along with an artificial neural network for text-based CAPTCHAs. We attack the popular CCT (Crowded Characters Together) based CAPTCHAs and compare our results with other schemes. As overall, our system achieves an overall precision of 51.3, 27.1 and 53.2% for Taobao, MSN and eBay datasets with 1000,500 and 1000 CAPTCHAs respectively. The benefits of this research are twofold: by recognizing text-based CAPTCHAs, we not only explore the weaknesses in the current design but also find a way to segment and recognize the connected characters from images. The proposed algorithm can be used in digitization of ancient books, handwriting recognition and other similar tasks.
机译:在过去的几年中,CAPTCHA作为一种区分人与垃圾邮件的安全机制而在互联网上无处不在。基于文本的验证码为用户提供了从挑战图像中识别变形文本的功能。基于硬AI问题,它们已成为计算机视觉和机器学习中的热门研究主题。当代的基于文本的验证码基于分割问题,该分割问题涉及将其分解成单个字符的子图像。对于当前尚未很大程度上解决的OCR程序,这是一项艰巨的任务。在本文中,我们提出了一种新颖的分割和识别方法,该方法使用简单的图像处理技术(包括阈值化,细化和像素计数方法)以及用于基于文本的CAPTCHA的人工神经网络。我们攻击了基于CACTHA的流行CCT(一起拥挤的字符),并将我们的结果与其他方案进行了比较。总体而言,我们的系统对于淘宝,MSN和eBay数据集分别具有1000,500和1000个CAPTCHA的总体精度达到51.3%,27.1和53.2%。这项研究的好处是双重的:通过识别基于文本的验证码,我们不仅探索了当前设计中的弱点,而且找到了一种从图像中分割和识别关联字符的方法。该算法可用于古籍数字化,手写识别和其他类似任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号