Collecting Handwritten Nom Character Patterns from Historical Document Pages

机译：从历史文档页面收集手写的NOM字符模式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we present methods of segmenting Nom historical documents and clustering character patterns to build a Nom character pattern database. Nom is an ideographic script to represent Vietnamese, used from the 10th century to 20th century. However, this heritage is nearly lost. In order to preserve the wisdom and knowledge expressed in Nom, recognition and digitalization are indispensable. Because there is no OCR for Nom yet, we have to start from collecting patterns. We have employed a projection profile based method for segmenting hundreds of pages into individual characters. Then, we have implemented a combination of Chinese OCR-based clustering and K-means clustering to group characters into categories. The experiment shows that the proposed system can help collecting the characters patterns effectively. Moreover, it has revealed that there are many character classes lost or uncategorized so far.

机译：在本文中，我们呈现了分割NOM历史文档和聚类字符模式的方法来构建NOM字符模式数据库。 NOM是代表越南语的表意向脚本，从10世纪到20世纪。然而，这种遗产几乎丢失了。为了保护NOM，认可和数字化的智慧和知识是不可或缺的。因为NOM的OCR尚未开始，我们必须从收集模式开始。我们使用了一种基于投影的配置文件，用于将数百页分段为单个字符。然后，我们已经实现了中文的基于OCR的群集和K-means群集的组合，以将组字符分为类别。实验表明，所提出的系统可以有助于有效地收集人物模式。此外，到目前为止还有许多字符类丢失或未分类。

著录项

来源
《IAPR International Workshop on Document Analysis Systems》|2012年||共5页
会议地点
作者
Truyen Van Phan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391-53;
关键词

相似文献

外文文献
中文文献
专利

1. Study on Automated Approach to Recognize Characters for Handwritten and Historical Document [J] . Dhivya S., Devi Usha G. ACM transactions on Asian and low-resource language information processing . 2021,第3期

机译：自动化方法识别手写和历史文献的特征
2. Recognition of Historical Handwritten Kannada Characters Using Local Binary Pattern Features [J] . Thippeswamy G., Chandrakala H. T. International journal of natural computing research . 2020,第3期

机译：使用本地二进制模式的识别历史手写的kannada字符
3. A Novel Approach for Character Segmentation of Offline Handwritten Marathi Documents written in MODI Script [J] . Parag A. Tamhankar, Krishnat D. Masalkar, Satish R. kolhe Procedia Computer Science . 2020,第5期

机译：Modi脚本中写的离线手写Marathi文档的一种新方法
4. Collecting Handwritten Nom Character Patterns from Historical Document Pages [C] . Truyen Van Phan Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on . 2012

机译：从历史文档页面收集手写Nom字符模式
5. Retrieval of handwritten historical document images [D] . Rath, Toni Maximilian 2005

机译：检索手写的历史文档图像
6. Ancient administrative handwritten documents: X-ray analysis and imaging [O] . F. Albertin, A. Astolfo, M. Stampanoni, -1

机译：古代行政手写文件：X射线分析和成像
7. Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents [O] . Min Soo Kim, Kyu Tae Cho, Hee Kue Kwag, 2004

机译：数字化韩国历史文献的手写字符的分割
8. Learning Algorithms for Multi-Class Pattern Classification and Problems Associated with on-Line Handwritten Character Recognition [R] . Li, C. C., Teng, T. L. 1970

机译：多类模式分类的学习算法及与在线手写字符识别相关的问题

Collecting Handwritten Nom Character Patterns from Historical Document Pages

摘要

著录项

相似文献

相关主题

期刊订阅