首页> 中文期刊> 《江西理工大学学报》 >基于底层图像特征组合的文本图像分类研究

基于底层图像特征组合的文本图像分类研究

     

摘要

针对文本图像特有的图像特征,提出了一种基于底层图像特征组合的文本图像分类方法,该方法使用了两层C4.5决策树分类器,能将文本图像有效地分为标题文本图像、文档图像和场景文本图像.首先将样本图像转换为灰度图像,提取灰度直方图的特征,根据灰度直方图特征的不同,可以先区分文档图像;然后把余下的图像转换为二值图像,提取图像的GLCM纹理特征,根据GLCM特征区分场景文本和标题文本图像.在开源的WEKA数据挖掘软件环境下进行仿真实验,结果表明该方法是可行的,并能够得到较高的查全率和查准率.%A text image classification method based on the combination of underlying image feature was proposed in this paper. With two layers of C4.5 decision tree classifier, the method can divide the text image into caption text image, document image and scene text image. The text image classification is a two-step process. In the first place, the sample image is converted into gray image for histogram feature extraction. Document images could then be well distinguished according to the variable characteristics of the gray histogram. In the second place, the rest of the images are converted into binary images to extract their GLCM features, according to which the scene text and caption text images are distinguished. Simulation experiments were carried out in the open source WEKA data mining software, the results showed that the method is feasible, and is able to get favorable recall and good precision ratio.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号