首页> 外文期刊>Journal of the American Society for Information Science and Technology >Image Retrieval From Scientific Publications:Text and Image Content Processing to Separate Multipanel Figures
【24h】

Image Retrieval From Scientific Publications:Text and Image Content Processing to Separate Multipanel Figures

机译:从科学出版物中检索图像:处理文本和图像内容以分离多面板图形

获取原文
获取原文并翻译 | 示例
       

摘要

Images contained in scientific publications are widely considered useful for educational and research purposes, and their accurate indexing is critical for efficient and effective retrieval. Such image retrieval is complicated by the fact that figures in the scientific literature often combine multiple individual subfigures (panels). Multipanel figures are in fact the predominant pattern in certain types of scientific publications. The goal of this work is to automatically segment multipanel figures—a necessary step for automatic semantic indexing and in the development of image retrieval systems targeting the scientific literature. We have developed a method that uses the image content as well as the associated figure caption to: (1) automatically detect panel boundaries; (2) detect panel labels in the images and convert them to text; and (3) detect the labels and textual descriptions of each panel within the captions. Our approach combines the output of image-content and text-based processing steps to split the multipanel figures into individual subfigures and assign to each subfigure its corresponding section of the caption. The developed system achieved precision of 81 % and recall of 73% on the task of automatic segmentation of multipanel figures.
机译:科学出版物中包含的图像被广泛认为对教育和研究有用,并且它们的准确索引对于有效地检索至关重要。由于科学文献中的数字经常结合多个单独的子图(面板),因此这种图像检索变得很复杂。实际上,在某些类型的科学出版物中,多面板图形是主要的模式。这项工作的目标是自动分割多面板图形,这是自动进行语义索引以及开发针对科学文献的图像检索系统的必要步骤。我们已经开发出一种使用图像内容以及相关图形标题的方法:(1)自动检测面板边界; (2)检测图像中的面板标签并将其转换为文本; (3)在字幕中检测每个面板的标签和文字描述。我们的方法结合了图像内容输出和基于文本的处理步骤的输出,以将多面板图形拆分为单独的子图,并为每个子图分配其对应的标题部分。在对多面板图形进行自动分割的任务上,开发的系统实现了81%的精度和73%的召回率。

著录项

  • 来源
  • 作者单位

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

    Lister Hill National Center for Biomedical Communications, National Library of Medicine, 8600 Rockville Pike,Bethesda, MD 20894 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 23:16:03

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号