首页> 外文会议>International Conference on Document Analysis and Recognition >Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets
【24h】

Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets

机译:打破平板电脑上的代码:带注释的楔形文字脚本在标准化2D和3D数据集中的学习挑战

获取原文
获取外文期刊封面目录资料

摘要

The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.
机译:假定已知楔形文字片剂的数量为数十万。在线Hilprecht存档包含1977年的高分辨率平板电脑3D扫描。在线楔形文字数据库CDLI列出了超过100.000片的元数据。尽管两者都是公开可用的,但楔形文字板上的大规模机器学习和模式识别仍然难以捉摸。只能通过搜索网页来访问数据,集合之间的平板电脑标识符不一致,并且3D数据没有准备好并且对自动化处理具有挑战性。通过组装经过处理的楔形文字片的交叉引用基准数据集,我们为楔形文字片的大规模分析铺平了道路:(i)具有预先计算的高维表面特征的正面对齐的3D片,(ii)六视图光栅图像用于现成的图像处理,以及(iii)元数据,转录和音译,用于707片平板电脑的子集,用于学习3D数据,图像和语言表达之间的对齐方式。这是楔形文字研究中同类和规模的第一个数据集。此基准数据集旨在为计算研究提供易用性和即时可用性,降低了实验的障碍并采用了标准的分析方法,网址为https://doi.org/10.11588/data/IE8CCN。

著录项

相似文献

  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号