首页> 外文会议>International Conference on Document Analysis and Recognition >OBC306: A Large-Scale Oracle Bone Character Recognition Dataset
【24h】

OBC306: A Large-Scale Oracle Bone Character Recognition Dataset

机译:OBC306:大型Oracle骨字符识别数据集

获取原文
获取外文期刊封面目录资料

摘要

The oracle bone script from ancient China is among the world's most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.
机译:中国古代的甲骨文是世界上最著名的古代文字系统之一。识别和破译甲骨文是甲骨文研究中最重要的主题之一,需要对中国古代文化有深入的了解。由于两个原因,该任务仍然非常具有挑战性。首先是它主要由人执行,并且需要高水平的经验,才能和奉献精神。第二是由于缺乏特定领域的数据,这阻碍了自动识别研究的发展。收集标记良好的甲骨文数据对于桥接甲骨文和信息处理领域是必不可少的。但是,尚未提供这样的数据集。因此,在本文中,我们构建了一个新的大规模的Oracle骨字符数据集,称为OBC306。我们还为该数据集提供了基于标准深层卷积神经网络的评估,以作为基准。通过统计和视觉分析,我们描述了甲骨识别的内在困难,并提出了使用信息处理的甲骨研究的未来挑战和扩展。该数据集包含超过300,000个从甲骨骨摩擦或图像中裁剪出来的字符级样本。据我们所知,它涵盖了306个字形类,并且是现有的最大的原始甲骨文字符集。预计该数据集的发布将促进甲骨文研究的发展,并导致最佳的算法解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号