OBC306: A Large-Scale Oracle Bone Character Recognition Dataset

机译：OBC306：大型Oracle骨字符识别数据集

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The oracle bone script from ancient China is among the world's most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.

机译：中国古代的甲骨文是世界上最著名的古代文字系统之一。识别和破译甲骨文是甲骨文研究中最重要的主题之一，需要对中国古代文化有深入的了解。由于两个原因，该任务仍然非常具有挑战性。首先是它主要由人执行，并且需要高水平的经验，才能和奉献精神。第二是由于缺乏特定领域的数据，这阻碍了自动识别研究的发展。收集标记良好的甲骨文数据对于桥接甲骨文和信息处理领域是必不可少的。但是，尚未提供这样的数据集。因此，在本文中，我们构建了一个新的大规模的Oracle骨字符数据集，称为OBC306。我们还为该数据集提供了基于标准深层卷积神经网络的评估，以作为基准。通过统计和视觉分析，我们描述了甲骨识别的内在困难，并提出了使用信息处理的甲骨研究的未来挑战和扩展。该数据集包含超过300,000个从甲骨骨摩擦或图像中裁剪出来的字符级样本。据我们所知，它涵盖了306个字形类，并且是现有的最大的原始甲骨文字符集。预计该数据集的发布将促进甲骨文研究的发展，并导致最佳的算法解决方案。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|681-688|共8页
会议地点
作者
Shuangping Huang; Haobin Wang; Yongge Liu; Xiaosong Shi; Lianwen Jin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bones; Dictionaries; Character recognition; Task analysis; Tools; Visualization;

机译：骨骼;字典;字符识别;任务分析;工具;可视化;

相似文献

外文文献
中文文献
专利

1. Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets [J] . Xinjie Feng, Hongxun Yao, Shengping Zhang Complexity . 2019,第1期

机译：在不平衡数据集中的中国光学字符识别的焦点CTC损耗
2. Focal CTC Loss for Chinese Optical Character Recognition on Unbalanced Datasets [J] . Xinjie Feng, Hongxun Yao, Shengping Zhang Complexity . 2019,第1期

机译：在不平衡数据集中的中国光学字符识别的焦点CTC损耗
3. Benchmark Pashto Handwritten Character Dataset and Pashto Object Character Recognition (OCR) Using Deep Neural Network with Rule Activation Function [J] . Imran Uddin, Dzati A. Ramli, Abdullah Khan, Complexity . 2021,第a期

机译：基准普什图粉扑手写字符数据集和普什图对象字符识别（OCR）使用具有规则激活功能的深神经网络
4. OBC306: A Large-Scale Oracle Bone Character Recognition Dataset [C] . Shuangping Huang, Haobin Wang, Yongge Liu, International Conference on Document Analysis and Recognition . 2019

机译：OBC306：大规模的Oracle骨骼识别数据集
5. Elephants and hunters, diviners and oracles: Yorùbá carving in bone and ivory [D] . Bonnell, Letty Wilson 2002

机译：大象和猎人，占卜者和神谕：约鲁巴人用骨头和象牙雕刻
6. Identifying hot papers and papers with delayed recognition in large-scale datasets by using dynamically normalized citation impact scores [O] . Lutz Bornmann, Adam Y. Ye, Fred Y. Ye -1

机译：通过使用动态归一化的引文影响力得分在大型数据集中识别热门论文和延迟识别论文
7. TextCaps: Handwritten Character Recognition With Very Small Datasets [O] . Vinoj Jayasundara, Sandaru Jayasekara, Hirunima Jayasekara, 2019

机译：TextCaps：手写字符识别非常小的数据集
8. Large-scale Benchmark Dataset for Event Recognition in Surveillance Video [R] . Oh, S., Hoogs, A., Perera, A., 2011

机译：用于监视视频中事件识别的大规模基准数据集

OBC306: A Large-Scale Oracle Bone Character Recognition Dataset

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅