Detecting Named Entities in Unstructured Bengali Manuscript Images

机译：检测非结构化孟加拉原稿图像中的命名实体

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.

机译：在本文中，我们进行了一项任务，可以直接从非结构化的手写文档图像找到命名实体，没有任何中间文本/字符识别。在这里，我们不会收到自然语言处理的任何帮助。因此，检测命名实体变得更具挑战性。我们在孟加拉脚本上工作，由于自己独特的脚本特征，额外的障碍带来了一些额外的障碍。在这里，我们提出了一种新的深度神经网络的架构，以从文本图像中提取潜在特征。然后将嵌入馈送到BLSTM（双向长期短期存储器）层。之后，注意机制适用于命名实体检测的方法。我们在两个公开可用的离线手写存储库上执行实验，其中包含420个孟加拉的手写页。我们的系统的实验结果非常令人印象深刻，因为它在整体命名实体检测中获得了95.43％的均衡准确性。

著录项

来源
《International Conference on Document Analysis and Recognition》|2019年|1 v.|共6页
会议地点
作者
Chandranath Adak; Bidyut B. Chaudhuri; Chin-Teng Lin; Michael Blumenstein;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术及设备;
关键词
Task analysis; Feature extraction; Engines; Optical character recognition software; Kernel; Organizations; Natural language processing;

机译：任务分析;特征提取;发动机;光学字符识别软件;内核;组织;自然语言处理;

相似文献

外文文献
中文文献
专利

1. A comparative study on feature reduction approaches in Hindi and Bengali named entity recognition [J] . Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar Knowledge-Based Systems . 2012,第期

机译：印地语和孟加拉语实体识别中特征约简方法的比较研究
2. A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi [J] . Asif Ekbal, Sivaji Bandyopadhyay Linguistic Issues in Language Technology . 2009,第1期

机译：孟加拉语和北印度语中命名实体识别的条件随机场方法
3. A web-based Bengali news corpus for named entity recognition [J] . Asif Ekbal, Sivaji Bandyopadhyay Computers and the Humanities . 2008,第2期

机译：基于网络的孟加拉新闻语料库，用于命名实体识别
4. Detecting Named Entities in Unstructured Bengali Manuscript Images [C] . Chandranath Adak, Bidyut B. Chaudhuri, Chin-Teng Lin, International Conference on Document Analysis and Recognition . 2019

机译：在非结构化孟加拉语手稿图像中检测命名实体
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [O] . Wangjin Lee, Jinwook Choi 2019

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
7. Detecting Named Entities in Unstructured Bengali Manuscript Images [O] . Chandranath Adak, Bidyut B. Chaudhuri, Chin-Teng Lin, 2019

机译：检测非结构化孟加拉原稿图像中的命名实体

Detecting Named Entities in Unstructured Bengali Manuscript Images

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅