首页>
外国专利>
TEACHING LANGUAGE MODELS USING TEXT CORPUSES CONTAINING REALISTIC ERRORS OF OPTICAL CHARACTER RECOGNITION (OCR)
TEACHING LANGUAGE MODELS USING TEXT CORPUSES CONTAINING REALISTIC ERRORS OF OPTICAL CHARACTER RECOGNITION (OCR)
展开▼
机译:使用包含视觉字符识别(OCR)的实际错误的文本语料库的教学语言模型
展开▼
页面导航
摘要
著录项
相似文献
摘要
FIELD: data processing.;SUBSTANCE: invention relates to formation of a text corpus containing realistic errors of optical character recognition (OCR), and training of language models using text corpuses. To this end, an example of method implementation includes creation of computer system initial set of images based on input text-containing text corpuses; computer application of one or more simulated defects on images of initial plurality of images to create augmented set of images; forming an output text corpus based on an augmented set of images and training a language model using the obtained text corpus for optical character recognition.;EFFECT: technical result consists in improvement of image recognition quality.;20 cl, 8 dwg
展开▼