Multi-task learning for simultaneous script identification and keyword spotting in document images

Cheikhrouhou Ahmed; Kessentini Yousri; Kanoun Slim

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Multi-task learning for simultaneous script identification and keyword spotting in document images

【24h】

Multi-task learning for simultaneous script identification and keyword spotting in document images

机译：多任务学习，用于同时脚本标识和文档图像中的关键字点

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, an end-to-end multi-task deep neural network was proposed for simultaneous script identification and Keyword Spotting (KWS) in multi-lingual hand-written and printed document images. We introduced a unified approach which addresses both challenges cohesively, by designing a novel CNNBLSTM architecture. The script identification stage involves local and global features extraction to allow the network to cover more relevant information. Contrarily to the traditional feature fusion approaches which build a linear feature concatenation, we employed a compact bi-linear pooling to capture pairwise correlations between these features. The script identification result is, then, injected in the KWS module to eliminate characters of irrelevant scripts and perform the decoding stage using a single-script mode. All the network parameters were trained in an end-to-end fashion using a multi-task learning that jointly minimizes the NLL loss for the script identification and the CTC loss for the KWS. Our approach was evaluated on a variety of public datasets of different languages and writing types.. Experiments proved the efficacy of our deep multi-task representation learning compared to the state-of-the-art systems for both of keyword spotting and script identification tasks. (c) 2021 Elsevier Ltd. All rights reserved.

机译：本文提出了一种端到端的多任务深度神经网络，用于多语言手写和打印文档图像中的同声脚本识别和关键字定位（KWS）。我们引入了一种统一的方法，通过设计一种新的CNNBLSTM体系结构，一致地解决了这两个挑战。脚本识别阶段涉及局部和全局特征提取，以允许网络覆盖更多相关信息。与传统的建立线性特征连接的特征融合方法相反，我们采用紧凑的双线性池来捕获这些特征之间的成对相关性。然后，将脚本识别结果注入KWS模块，以消除无关脚本的字符，并使用单个脚本模式执行解码阶段。所有网络参数都是使用多任务学习以端到端的方式进行训练的，该学习将脚本识别的NLL损失和KWS的CTC损失降至最低。我们的方法在不同语言和书写类型的各种公共数据集上进行了评估。。实验证明，与最先进的系统相比，我们的深度多任务表征学习对于关键词识别和脚本识别任务都是有效的。（c）2021爱思唯尔有限公司保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2021年第1期|共10页
作者
Cheikhrouhou Ahmed; Kessentini Yousri; Kanoun Slim;
展开▼
作者单位

Digital Res Ctr Sfax Sfax Tunisia;

Digital Res Ctr Sfax Sfax Tunisia;

Univ Sfax MIRACL Lab Sfax Tunisia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
CBP; CTC; Keyword spotting; Script identification; Handwritten;

机译：CBP;CTC;关键词斑点;脚本识别;手写;

相似文献

外文文献
中文文献
专利

1. Multi-Task Learning for Food Identification and Analysis with Deep Convolutional Neural Networks [J] . Xi-Jin Zhang, Yi-Fan Lu, Song-Hai Zhang 计算机科学技术学报（英文版） . 2016,第003期
2. Food Ingredients Identification from Dish Images by Deep Learning [J] . Ziyi Zhu, Ying Dai 电脑和通信（英文） . 2021,第004期
3. Machine learning identification of impurities in the STM images [J] . Ce Wang, Haiwei Li, Zhenqi Hao, 中国物理：英文版 . 2020,第011期
4. Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage [J] . Neural computing & applications . 2020,第13期

机译：用于多脚本关键字在具有识别阶段的打印和手写文档中的多脚本关键字的混合HMM / BLSTM系统
5. PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification [J] . Obaidullah Sk Md, Halder Chayan, Santosh K. C., Multimedia Tools and Applications . 2018,第2期

机译：PHDIndic_11：11个官方印度脚本的页面级手写文档图像数据集，用于脚本识别
6. Handwritten Indic Script Identification in Multi-Script Document Images: A Survey [J] . Obaidullah Sk Md, Santosh K. C., Das Nibaran, International Journal of Pattern Recognition and Artificial Intelligence . 2018,第10期

机译：多脚本文档图像中的手写印度文字识别：一项调查
7. Simultaneous Script Identification and Handwriting Recognition via Multi-Task Learning of Recurrent Neural Networks [C] . Zhuo Chen, Yichao Wu, Fei Yin, IAPR International Conference on Document Analysis and Recognition . 2017

机译：通过递归神经网络的多任务学习同时进行脚本识别和手写识别
8. Keywords in the mist: Automated keyword extraction for very large documents and back of the book indexing. [D] . Csomai, Andras. 2008

机译：薄雾中的关键字：自动提取非常大的文档并在书后建立索引的关键字。
9. Click-words: learning to predict document keywords from a user perspective [O] . Rezarta Islamaj Doğan, Zhiyong Lu -1

机译：点击字词：从用户角度学习预测文档关键字
10. Keyword Spotting on Hangul Document Images Using Image-to-Image Matching [O] . Sang Cheol Park, Hwa Jeong Son, Soo Hyung Kim 2005

机译：在Hangul文档图像上发现使用图像到图像匹配的关键字
11. Automatic script identification from images using cluster-based templates [R] . Hochberg, J. , Kerns, L. , Kelly, P. , 1995

机译：使用基于群集的模板从图像中自动识别脚本

Multi-task learning for simultaneous script identification and keyword spotting in document images

摘要

著录项

相似文献

相关主题

期刊订阅