首页>
外国专利>
Automatic training of character templates using a text line image, a text line transcription and a line image source model
Automatic training of character templates using a text line image, a text line transcription and a line image source model
展开▼
机译:使用文本行图像,文本行转录和行图像源模型自动训练字符模板
展开▼
页面导航
摘要
著录项
相似文献
摘要
A technique for automatically producing, or training, a set of bitmapped character templates defined according to the sidebearing model of character image positioning uses as input a text line image of unsegmented characters, called glyphs, as the source of training samples. The training process also uses a transcription associated with the text line image, and an explicit, grammar-based text line image source model that describes the structural and functional features of a set of possible text line images that may be used as the source of training samples. The transcription may be a literal transcription of the line image, or it may be nonliteral, for example containing logical structure tags for document formatting and layout, such as found in markup languages. Spatial positioning information modeled by the text line image source model and the labels in the transcription are used to determine labeled image positions identifying the location of glyph samples occurring in the input line image, and the character templates are produced using the labeled image positions. In another aspect of the technique, a set of character templates defined by any character template model, such as a segmentation-based model, is produced using the grammar- based text line image source model and specifically using a tag transcription containing logical structure tags for document formatting and layout. Both aspects of the training technique may represent the text line image source model and the transcription as finite state networks.
展开▼