首页> 外文会议>International Workshop of the Initiative for the Evaluation of XML Retrieval >The Book Structure Extraction Competition with the Resurgence Software at Caen University
【24h】

The Book Structure Extraction Competition with the Resurgence Software at Caen University

机译:蔡辰大学中复苏软件的书籍结构提取竞争

获取原文

摘要

The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the first time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation. The main idea is to use a model describing relationships for elements in the document structure. Chapters are represented and implemented by frontiers between chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e.g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.
机译:Greyc Island Team首次参与了Inex博士赛道的结构提取竞争部分,并使用了复苏软件。我们使用最小的策略主要基于自上而下的文档表示。主要思想是使用描述文档结构中元素的关系的模型。章节由章节之间的前沿表示并实施。页面也使用。周边中心关系在整个文档上计算并在每个页面上反映。这种方法的强点是它涉及整个文件;它处理没有TOCS的书籍,并且在TOC中没有表示的标题(例如前言);它不依赖于词典,因此宽容侵蚀OCR错误和语言;它很简单快速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号