首页> 外文会议>International Conference on Document Analysis and Recognition >Detecting Named Entities in Unstructured Bengali Manuscript Images
【24h】

Detecting Named Entities in Unstructured Bengali Manuscript Images

机译:检测非结构化孟加拉原稿图像中的命名实体

获取原文

摘要

In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.
机译:在本文中,我们进行了一项任务,可以直接从非结构化的手写文档图像找到命名实体,没有任何中间文本/字符识别。在这里,我们不会收到自然语言处理的任何帮助。因此,检测命名实体变得更具挑战性。我们在孟加拉脚本上工作,由于自己独特的脚本特征,额外的障碍带来了一些额外的障碍。在这里,我们提出了一种新的深度神经网络的架构,以从文本图像中提取潜在特征。然后将嵌入馈送到BLSTM(双向长期短期存储器)层。之后,注意机制适用于命名实体检测的方法。我们在两个公开可用的离线手写存储库上执行实验,其中包含420个孟加拉的手写页。我们的系统的实验结果非常令人印象深刻,因为它在整体命名实体检测中获得了95.43%的均衡准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号