首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Automating Transliteration of Cuneiform from Parallel Lines with Sparse Data
【24h】

Automating Transliteration of Cuneiform from Parallel Lines with Sparse Data

机译:使用稀疏数据从平行线自动楔形文字音译

获取原文
获取外文期刊封面目录资料

摘要

Cuneiform tablets appertain to the oldest textual artifacts and are in extent comparable to texts written in Latin or ancient Greek. The Cuneiform Commentaries Project (CPP) from Yale University provides tracings of cuneiform tablets with annotated transliterations and translations. As a part of our work analyzing cuneiform script computationally with 3D-acquisition and word-spotting, we present a first approach for automatized learning of transliterations of cuneiform tablets based on a corpus of parallel lines. These consist of manually drawn cuneiform characters and their transliteration into an alphanumeric code. Since the Cuneiform script is only available as raster-data, we segment lines with a projection profile, extract Histogram of oriented Gradients (HoG) features, detect outliers caused by tablet damage, and align those features with the transliteration. We apply methods from part-of-speech tagging to learn a correspondence between features and transliteration tokens. We evaluate point-wise classification with K-Nearest Neighbors (KNN) and a Support Vector Machine (SVM); sequence classification with a Hidden Markov Model (HMM) and a Structured Support Vector Machine (SVM-HMM). Analyzing our findings, we reach the conclusion that the sparsity of data, inconsistent labeling and the variety of tracing styles do currently not allow for fully automatized transliterations with the presented approach. However, the pursuit of automated learning of transliterations is of great relevance as manual annotation in larger quantities is not viable, given the few experts capable of transcribing cuneiform tablets.
机译:楔形文字板属于最古老的文字制品,在一定程度上可与拉丁文或古希腊文书写的文字相提并论。耶鲁大学的楔形文字评论计划(CPP)提供了带注释音译和翻译的楔形文字药片的描迹。作为我们使用3D采集和单词点算计算分析楔形文字脚本的工作的一部分,我们提出了一种基于平行线语料库自动学习楔形文字片音译的第一种方法。这些由手动绘制的楔形文字和它们的音译成字母数字代码组成。由于Cuneiform脚本仅可作为栅格数据使用,因此我们使用投影轮廓对线进行了分割,提取了定向梯度直方图(HoG)特征的直方图,检测了由于数位板损坏而导致的离群值,并将这些特征与音译对齐。我们使用词性标注中的方法来学习特征与音译标记之间的对应关系。我们使用K最近邻(KNN)和支持向量机(SVM)评估逐点分类;隐马尔可夫模型(HMM)和结构化支持向量机(SVM-HMM)进行序列分类。分析我们的发现,我们得出的结论是,数据稀疏,标签不一致以及跟踪样式多种多样,目前尚无法使用所提出的方法实现全自动音译。但是,由于很少有能够抄录楔形文字片的专家,因此进行自动音译学习非常重要,因为要进行大量手动注释是不可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号