针对目前基于统计特征和符号匹配的识别方法对字体较敏感的问题,提出一种基于多特征融合的东亚文种识别算法.该算法首先分析并提取高频形状特征、排版特征以及字符复杂度特征,然后采用模糊集贴近度准则进行识别.实验结果表明,该算法具有较高的识别准确率,并对不同字体具有较强的鲁棒性.%Script identification has important applications in the field of document image information retrieval. An east asiatic script identification approach was proposed based on multi-feature. Compared to traditional identification method based on statistical characteristics and symbols matching, the algorithm first analyzes and extracts the token shape matching features,layout features and character complexity features,and then uses closeness degree of fuzzy sets to I-dentify. The experimental results show that the algorithm has higher recognition accuracy and strong robustness to different fonts.
展开▼