Cross-reference identification within a PDF document

机译：PDF文档中的交叉引用标识

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Cross-references, such like footnotes, endnotes, figure/table captions, references, are a common and useful type of page elements to further explain their corresponding entities in the target document. In this paper, we focus on cross-reference identification in a PDF document, and present a robust method as a case study of identifying footnotes and figure references. The proposed method first extracts footnotes and figure captions, and then matches them with their corresponding references within a document. A number of novel features within a PDF document, i.e., page layout, font information, lexical and linguistic features of cross-references, are utilized for the task. Clustering is adopted to handle the features that are stable in one document but varied in different kinds of documents so that the process of identification is adaptive with document types. In addition, this method leverages results from the matching process to provide feedback to the identification process and further improve the algorithm accuracy. The primary experiments in real document sets show that the proposed method is promising to identify cross-reference in a PDF document.

机译：交叉引用（例如脚注，尾注，图形/表格标题，参考）是一种常见且有用的页面元素类型，用于进一步解释目标文档中的相应实体。在本文中，我们专注于PDF文档中的交叉引用识别，并提出了一种可靠的方法，作为识别脚注和图形引用的案例研究。所提出的方法首先提取脚注和图形标题，然后将它们与文档中它们的相应引用进行匹配。 PDF文档中的许多新颖功能（例如页面布局，字体信息，交叉引用的词汇和语言功能）均用于该任务。采用聚类来处理在一个文档中稳定但在不同类型的文档中变化的特征，从而使识别过程与文档类型相适应。另外，该方法利用匹配过程的结果为识别过程提供反馈，并进一步提高算法的准确性。实际文档集中的主要实验表明，该方法有望用于识别PDF文档中的交叉引用。

著录项

来源
《Document recognition and retrieval XXII》|2015年|940209.1-940209.10|共10页
会议地点 San Francisco CA(US)
作者
Sida Li; Liangcai Gao; Zhi Tang; Yinyan Yu;
展开▼
作者单位

Institute of Computer Science Technology, Peking University Beijing, China;

Institute of Computer Science Technology, Peking University Beijing, China;

Institute of Computer Science Technology, Peking University Beijing, China;

Institute of Computer Science Technology, Peking University Beijing, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
cross-reference; PDF documents; document analysis and understanding;

机译：交叉引用； PDF文件；文件分析与理解;

相似文献

外文文献
中文文献
专利

1. Mathematical formula identification and performance evaluation in PDF documents [J] . Xiaoyan Lin, Liangcai Gao, Zhi Tang, International Journal on Document Analysis and Recognition . 2014,第3期

机译：PDF文档中的数学公式识别和性能评估
2. Identifying Similar Cases in Document Networks Using Cross-Reference Structures [J] . Botsis Taxiarchis, Scott John, Woo Emily Jane, Biomedical and Health Informatics, IEEE Journal of . 2015,第6期

机译：使用交叉引用结构识别文档网络中的相似案例
3. Mineral identification by elemental composition: a new tool within PDF-4 databases [J] . Fawcett T. G., Blanton J. R., Kabekkodu S. N., Powder diffraction . 2018,第2期

机译：通过元素组成进行矿物鉴定：PDF-4数据库中的新工具
4. Cross-reference identification within a PDF document [C] . Sida Li, Liangcai Gao, Zhi Tang, Conference on document recognition and retrieval XXII . 2015

机译：PDF文档中的交叉引用识别
5. Automatic semantic header generator for PDF documents [D] . Xue, Furong 2004

机译：PDF文档的自动语义头生成器
6. Desktop document delivery using portable document format (PDF) files and the Web. [O] . J P Shipman, W L Gembala, J M Reeder, 1998

机译：使用可移植文档格式（PDF）文件和Web进行桌面文档传递。
7. PDF (40 K) View thumbnail images View full size images Add to my quick links Cited by E-mail article Save as citation alert Export citation + link Set up a citation RSS feed (Opens new window) Related Articles in ScienceDirect Contents of volume 154 Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Contents of volume 154 Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Pages 350-351 PDF (25 K) Special issue contents page Physics of The Earth and Planetary Interiors Close You are entitled to access the full text of this document Special issue contents page Physics of The Earth and Planetary Interiors, Volume 154, Issues 3-4, 16 March 2006, Page iv PDF (22 K) View More Related Articles Bookmark and share in 2collab (opens in new window) Request permission to reuse this article View Record in Scopus Cited By in Scopus (0) doi:10.1016/j.pepi.2005.12.002 How to Cite or Link Using DOI (Opens New Window) Copyright © 2006 Elsevier B.V. All rights reserved. Preface [O] . Lagroix France, Muxworthy Adrian, Hoffmann Viktor 2006

机译：PDF（40 K）查看缩略图查看全尺寸图像添加到我的快速链接被电子邮件引用引用另存为引用警报导出引用+链接设置引用RSS提要（打开新窗口）ScienceDirect中的相关文章第154卷的内容地球和行星内部物理学您有权访问本文档的全文。第154卷的内容2006年3月16日，第154卷，第3-4期，第154卷，第350-351页PDF（25 K）特刊内容页地球和行星内饰关闭您有权访问本文档的全文特别发行内容页面地球与行星内饰物理，第154卷，第3-4期，2006年3月16日，第iv PDF（22 K）查看更多相关文章在2collab中添加书签并共享（在新窗口中打开）请求重新使用本文的权限在Scopus中查看记录在Scopus中被引用（0）doi：10.1016 / j.pepi.2005.12.002如何使用DOI进行引用或链接（打开新窗口）版权所有©2006 Elsevier B .V。保留所有权利。前言

Cross-reference identification within a PDF document

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅