首页> 外文会议>Conference on From Innovation to Impact >An Approach for Resolving Double Character Segmentation in Sinhala Social Media Text Images
【24h】

An Approach for Resolving Double Character Segmentation in Sinhala Social Media Text Images

机译:解决僧伽加社交媒体文本图像中的双字符细分的方法

获取原文

摘要

Sinhala is the official language of Sri Lanka, which has descended from the ancient Brahmi script. For Asian languages such as Sinhala and Tamil, only a little attention has been given to text extraction. Sinhala text in digital images have different fonts and font sizes, thus making the text extraction process harder. It is a challenging task to segment overlapping and touching Sinhala characters in digital images, and effectively recognize the segmented characters. This paper outlines a software system designed and implemented to extract Sinhala text from digital images. In this study, a connected components labelling algorithm was used for overlapping character segmentation, while a background thinning based approach was used for touching character segmentation. The proposed system is capable of extracting Sinhala text from digital images with nearly 76% accuracy.
机译:Sinhala是Sri Lanka的官方语言,它从古老的Brahmi剧本中下降。 对于如Sinhala和Tamil等亚洲语言,只有一点关注文本提取。 数字图像中的Sinhala文本具有不同的字体和字体大小,从而使文本提取过程更加困难。 它是一个具有挑战性的任务,可以在数字图像中进行重叠和触摸Sinhala字符,有效地识别分段字符。 本文概述了设计和实施的软件系统,以从数字图像中提取Sinhala文本。 在该研究中,连接了基于背景的背景变薄的方法,用于重叠字符分割的连接组件标记算法用于触摸字符分割。 所提出的系统能够从数字图像中提取僧伽纳文本,精度近76%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号