【24h】

Recognizing Vietnamese Online Handwritten Separated Characters

机译:识别越南在线手写分离字符

获取原文

摘要

Vietnamese alphabet is based on the Latin alphabet with the addition of nine accent marks or diacritics four of them to create additional sounds, and the other five to indicate the tone of each word. Because Vietnamese is a tonal language that uses tone to distinguish words, recognizing diacritics is an important part in recognizing Vietnamese word. However, in written form, diacritics are much smaller then the characters, which make very them hard to recognize. Previous works on Vietnamese characters recognition often pre-process input with a graph-based approach by trying to separate the main characters with their diacritics by determining connected regions at pixel level. his approach, however, only works well where the input contains only characters with separable diacritics, for example, scanned image of printed documents. We propose in this paper a robust method to recognize online Vietnamese characters with diacritics. Using cosine transformation with appropriated sampling algorithms, we represent multiple strokes of a character together in a single set of features. This set of features is then used as the input for a well designed machine learning based system. We have tested our system on the combination of Vietnamese characters with diacritics and Section 1c (isolated characters) of the Unipen data set, and have obtained very competitive results.
机译:越南字母表基于拉丁字母,添加了九个重音标记或变音符号,其中四个是创建额外的声音,另外五个表示每个单词的音调。因为越南语是一种使用基调来区分词语的色调语言,识别既表达了越南语的重要组成部分。但是,在书面形式中,数字较小,字符要小得多,这使得它们很难识别。以前的作品在越南字符识别通常通过试图通过在像素级别确定连接区域来将主字符与其变音的主要字符分离为基于图形的方法。然而,他的方法很好地运行,其中输入仅包含具有可分离变形物的字符,例如,打印文档的扫描图像。我们提出了一种稳健的方法,可以识别与变音的在线越南特征。使用具有适当的采样算法的余弦变换,我们在一组特征集中表示字符的多个笔划。然后将这组功能用作基于机械学习的系统的输入。我们已经测试了我们的系统与UniPen数据集的变音和第1C条(隔离字符)的越南字符组合,并获得了非常有竞争力的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号