Recognition of printed arabic text based on global features and decision tree learning techniques

Amin A.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Recognition of printed arabic text based on global features and decision tree learning techniques

【24h】

Recognition of printed arabic text based on global features and decision tree learning techniques

机译：基于全局特征和决策树学习技术的印刷阿拉伯文本识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine simulation of human reading has been the subject of intensive research for almost three decades. A large number of research papers and reports have already been published on Latin, Chinese and Japanese characters. However, little work has been conducted on the automatic recognition of Arabic in both on-line and off-line, has been achieved towards the automatic recognition of Arabic characters. This is a result of the lack of adequate support in terms of funding, and other utilities such as Arabic text databases, dictionaries, etc., and of course because of the cursive nature of its writing rules, and this problem is still an open research field. This paper presents a new technique for the recognition of Arabic text using the C4.5 machine learning system. The advantage of machine learning are twofold: it can generalize over the large degree of variations between different fonts and writing style and recognition rules can be constructed by examples. The technique can be divided into three major steps. The first step is digitization and pre-processing to create connected component, detect the skew of a document image and correct it. Second, feature extraction. where global features of the input Arabic word is used to extract features such as number of subwords, number of peaks within the subword, number and position of the complementary character etc., to avoid the difficulty of segmentation stage. Finally, machine learning C4.5 is used to generate a decision tree for classifying each word. The system was tested with 1000 Arabic words with different fonts (each word has 15 samples) and the correct average recognition rate obtained using cross-validation was 92%. (C) 2000 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. [References: 44]

机译：近三十年来，人类阅读的机器模拟一直是深入研究的主题。大量的研究论文和报告已经发表在拉丁文，中文和日文字符上。但是，在在线和离线自动识别阿拉伯语方面，几乎没有进行任何工作，以实现对阿拉伯字符的自动识别。这是由于缺乏足够的资金支持以及诸如阿拉伯文本数据库，字典等其他实用程序的支持，并且当然是由于其编写规则的草书性质，这个问题仍然是一个开放的研究领域。本文提出了一种使用C4.5机器学习系统识别阿拉伯文字的新技术。机器学习的优点是双重的：它可以概括不同字体和书写样式之间的较大差异，并且可以通过示例构建识别规则。该技术可以分为三个主要步骤。第一步是数字化和预处理，以创建连接的组件，检测文档图像的偏斜并进行校正。第二，特征提取。其中使用输入的阿拉伯词的全局特征来提取特征，例如子词数，子词内的峰数，补充字符的数和位置等，以避免分割阶段的困难。最后，机器学习C4.5用于生成用于对每个单词进行分类的决策树。该系统使用1000种不同字体的阿拉伯语单词（每个单词有15个样本）进行了测试，使用交叉验证获得的正确平均识别率为92％。（C）2000模式识别学会。由Elsevier Science Ltd.出版。保留所有权利。 [参考：44]

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2000年第8期|共15页
作者
Amin A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术及设备;
关键词
Pattern recognition; Printed arabic text; Connected component; Skew detection and correction; Global features; Structural classification; Machine learning c4; 5; Cross-validation; Character-recognition;

机译：模式识别;印刷阿拉伯语文本;连接的组件;倾斜检测和纠正;整体特征;结构分类;机器学习c4;5;交叉验证;字符识别;

相似文献

外文文献
中文文献
专利

1. Recognition of printed arabic text based on global features and decision tree learning techniques [J] . Amin A. Pattern Recognition: The Journal of the Pattern Recognition Society . 2000,第8期

机译：基于全局特征和决策树学习技术的印刷阿拉伯文本识别
2. STRUCTURAL DESCRIPTION TO RECOGNIZING HAND-PRINTED ARABIC CHARACTERS USING DECISION TREE LEARNING TECHNIQUES [J] . A. Amin, N. Al-Darwish International Journal of Computers & Applications . 2006,第2期

机译：利用决策树学习技术识别手抄本阿拉伯字符的结构描述
3. Speech emotion recognition based on feature selection and extreme learning machine decision tree [J] . Liu Zhen-Tao, Wu Min, Cao Wei-Hua, Neurocomputing . 2018,第jana17期

机译：基于特征选择和极限学习机决策树的语音情感识别
4. Recognition of printed Arabic text using machine learning [C] . Adnan Amin, Univ. of New South Wales, Sydney NSW, Conference on document recognition . 1998

机译：使用机器学习识别印刷的阿拉伯文字
5. Landscape and impervious surface mapping in the Twin Cities Metropolitan Area using Feature Recognition and Decision Tree techniques. [D] . Nagel, Philipp. 2014

机译：使用特征识别和决策树技术在双城市大都会地区进行景观和不透水表面贴图。
6. Developing a Model-based Drinking Water Decision Support System Featuring Remote Sensing and Fast Learning Techniques [O] . Sanaz Imen, Ni-Bin Chang, Y. Jeffery Yang, -1

机译：开发具有遥感和快速学习技术的基于模型的饮用水决策支持系统
7. Recognition of hand-printed Chinese characters using decision trees/machine learning C4.5 system [O] . Adnan Amin, Sameer Singh 1998

机译：使用决策树/机器学习C4.5系统识别手写汉字

Recognition of printed arabic text based on global features and decision tree learning techniques

摘要

著录项

相似文献

相关主题

期刊订阅