Unsupervised language model adaptation for handwritten Chinese text recognition

Qiu-Feng Wang; Fei Yin; Cheng-Lin Liu

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Unsupervised language model adaptation for handwritten Chinese text recognition

【24h】

Unsupervised language model adaptation for handwritten Chinese text recognition

机译：手写中文识别的无监督语言模型自适应

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.

机译：本文提出了一种有效的方法，用于在无约束的手写中文文本的离线识别中使用多个模型的无监督语言模型自适应（LMA）。要识别的文档的领域是可变的，并且通常是先验未知的，因此我们使用具有预定义多域语言模型集的两次通过识别策略。我们提出了三种方法来动态生成自适应语言模型以匹配通过首遍识别输出的文本：模型选择，模型组合和模型重构。在模型选择中，我们在首遍识别的文本上使用具有最小困惑度的语言模型。通过模型组合，我们通过最小化L2-norm和L1-norm正则化的平方误差总和来学习组合权重。对于模型重建，我们使用一组正交基来重建语言模型，该语言模型具有学习的系数以匹配文档以进行识别。此外，我们使用分裂矢量量化（SVQ）和主成分分析（PCA）两种压缩方法来减少多语言模型的存储大小。在两个公共中文手写数据库CASIA-HWDB和HIT-MW上的综合实验表明，所提出的无监督LMA方法显着提高了识别性能，特别是对于古代领域文档，其识别精度提高了7％。同时，两种压缩方法的组合大大减小了语言模型的存储大小，而几乎没有识别精度的损失。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2014年第3期|共15页
作者
Qiu-Feng Wang; Fei Yin; Cheng-Lin Liu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Character string recognition; Chinese handwriting recognition; Unsupervised language model adaptation; Language model compression;

机译：字符串识别;中文手写识别;无监督语言模型自适应;语言模型压缩;

相似文献

外文文献
中文文献
专利

1. Unsupervised language model adaptation for handwritten Chinese text recognition [J] . Qiu-Feng Wang, Fei Yin, Cheng-Lin Liu Pattern Recognition: The Journal of the Pattern Recognition Society . 2014,第3期

机译：手写中文识别的无监督语言模型自适应
2. Integration Of N-gram Language Models Inmultiple Classifier Systems For Offline handwritten Text Line Recognition [J] . ROMAN BERTOLAMI, HORST BUNKE International Journal of Pattern Recognition and Artificial Intelligence . 2008,第7期

机译：N-gram语言模型在多个分类器系统中的集成，用于离线手写文本行识别
3. HANDWRITTEN TEXT RECOGNITION USING INCOMPLETE PROBABILISTIC LEXICON AND CHARACTER LANGUAGE MODEL [J] . JERZY SAS Systems Science . 2006,第2期

机译：使用不完全概率的词汇和字符语言模型的手写文本识别
4. Improving Handwritten Chinese Text Recognition by Unsupervised Language Model Adaptation [C] . Qiu-Feng Wang Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on . 2012

机译：通过无监督语言模型自适应改进手写中文文本的识别
5. From Translation to Adaptation: Chinese Language Texts and Early Modern Japanese Literature [D] . Hartmann, Nan Ma 2014

机译：从翻译到适应：中文文本和早期现代日本文学
6. Unsupervised Medical Entity Recognition and Linking in Chinese Online Medical Text [O] . Jing Xu, Liang Gan, Mian Cheng, 2018

机译：中文在线医学文本中的无监督医学实体识别与链接
7. Integrating Language Model in Handwritten Chinese Text Recognition [O] . Qiu-feng Wang, Fei Yin, Cheng-lin Liu 2015

机译：语言模型在手写中文文本识别中的整合

Unsupervised language model adaptation for handwritten Chinese text recognition

摘要

著录项

相似文献

相关主题

期刊订阅