首页> 外文会议>International Conference on Document Analysis and Recognition >Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts
【24h】

Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts

机译:Indiscapes:用于历史印度手稿布局解析的实例分割网络

获取原文

摘要

Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.
机译:印度次大陆的历史棕榈叶手稿和早期纸张文献构成了世界文学和文化遗产的重要组成部分。尽管具有重要意义,但不存在大规模的带注释的印度手稿图像数据集。为了解决这一不足,我们引入了Indiscapes,这是历史上第一个具有多区域布局注释的印度数据集的数据集。为了应对脚本中的大量多样性以及密集,不规则的布局元素(例如,文本行,图片,每个图像有多个文档)的挑战,我们采用了全卷积深度神经网络体系结构,以实现以下方面的全自动实例级空间布局解析:手稿图像。我们在Indiscapes数据集中的图像上证明了提出的体系结构的有效性。为了实现注释的灵活性并牢记领域专家的非技术性,我们还提供了基于Web的自定义GUI注释工具和仪表板式分析门户。总体而言,我们的贡献为实现大规模应用在历史上的印度手稿中的OCR和字词查找等下游应用程序奠定了基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号