首页> 外文会议>Chinese conference pattern recognition and computer vision >A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm
【24h】

A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm

机译:藏族历史文献中的动人字符库评估分割算法

获取原文

摘要

The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm. In this paper, we present a new touching Tibetan character strings database. Firstly, using the previous proposed layout analysis and text-line segmentation algorithms, we segment scanned images of historical Tibetan documents into text-line images. Then, we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples. Finally, we annotate the data manually and establish the touching character database. The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters. It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings. For each image, the annotated ground truth file includes class labels, candidate segment points, baseline and average stroke width of a Tibetan single character. According to the type of touching, we divide the touching character string into three types: AB, OB and BB. We also count the number of different type of samples and find that 76.27% of the samples belongs to the third type (BB). In the end, we measure the performance of the over-segmentation algorithm on this database for reference.
机译:基准数据库在评估触摸字符串分割算法的性能方面起着至关重要的作用。在本文中,我们提出了一个新的动人的藏语字符串数据库。首先,使用先前提出的布局分析和文本行分割算法,我们将历史西藏文件的扫描图像分割为文本行图像。然后,我们使用关联成分分析找到候选的触摸藏文字符串,并筛选出正确的触摸样本。最后,我们手动注释数据并建立触摸字符数据库。该数据库包含5844个具有两个接触字符的图像和1399个具有两个以上接触字符的图像。适用于评估触摸式藏文字符串的分割算法。对于每幅图像,带注释的地面真相文件均包含类别标签,候选句点,基线和藏族单个字符的平均笔划宽度。根据触摸的类型,我们将触摸字符串分为三种类型:AB,OB和BB。我们还计算了不同类型的样本数量,发现76.27%的样本属于第三类(BB)。最后,我们在该数据库上测量过度分割算法的性能,以供参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号