首页> 外文会议>IAPR International Workshop on Document Analysis Systems >Collecting Handwritten Nom Character Patterns from Historical Document Pages
【24h】

Collecting Handwritten Nom Character Patterns from Historical Document Pages

机译:从历史文档页面收集手写的NOM字符模式

获取原文

摘要

In this paper, we present methods of segmenting Nom historical documents and clustering character patterns to build a Nom character pattern database. Nom is an ideographic script to represent Vietnamese, used from the 10th century to 20th century. However, this heritage is nearly lost. In order to preserve the wisdom and knowledge expressed in Nom, recognition and digitalization are indispensable. Because there is no OCR for Nom yet, we have to start from collecting patterns. We have employed a projection profile based method for segmenting hundreds of pages into individual characters. Then, we have implemented a combination of Chinese OCR-based clustering and K-means clustering to group characters into categories. The experiment shows that the proposed system can help collecting the characters patterns effectively. Moreover, it has revealed that there are many character classes lost or uncategorized so far.
机译:在本文中,我们呈现了分割NOM历史文档和聚类字符模式的方法来构建NOM字符模式数据库。 NOM是代表越南语的表意向脚本,从10世纪到20世纪。 然而,这种遗产几乎丢失了。 为了保护NOM,认可和数字化的智慧和知识是不可或缺的。 因为NOM的OCR尚未开始,我们必须从收集模式开始。 我们使用了一种基于投影的配置文件,用于将数百页分段为单个字符。 然后,我们已经实现了中文的基于OCR的群集和K-means群集的组合,以将组字符分为类别。 实验表明,所提出的系统可以有助于有效地收集人物模式。 此外,到目前为止还有许多字符类丢失或未分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号