首页> 外文会议>Comparative evaluation of focused retrieval >The Book Structure Extraction Competition with the Resurgence Software for Part and Chapter Detection at Caen University
【24h】

The Book Structure Extraction Competition with the Resurgence Software for Part and Chapter Detection at Caen University

机译:卡昂大学使用Resurgence软件进行图书结构提取竞赛

获取原文
获取原文并翻译 | 示例

摘要

The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.
机译:GREYC Island团队使用Resurgence软件第二次参加了INEX Book曲目的“结构提取竞赛”部分。我们使用的最小策略主要基于自上而下的文档表示形式,分为两个部分,即部分和章节。主要思想是使用描述文档结构中元素关系的模型。高层单元之间的边界被检测到,然后是章节。页面也被使用。外围中心关系在整个文档上计算并反映在每一页上。这种方法的优点是可以处理整个文档。它处理没有ToC的书,以及未在ToC中表示的书名(例如前言);它不依赖于词典,因此可以容忍OCR错误和语言独立性;它既简单又快速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号