首页> 外文会议>Document recognition and retrieval XVIII >Reflowing-driven paragraph recognition for electronic books in PDF
【24h】

Reflowing-driven paragraph recognition for electronic books in PDF

机译:回流驱动段落识别PDF电子书

获取原文
获取原文并翻译 | 示例

摘要

When reading electronic books on handheld devices, content sometimes should be reflowed and recomposed to adapt for small-screen mobile devices. According to people's reading practice, it is reasonable to reflow the text content based on paragraphs. Hence, this paper addresses the requirement and proposes a set of novel methods on paragraph recognition for electronic books in PDF. The proposed methods consist of three steps, namely, physical structure analysis, paragraph segmentation, and reading order detection. We make use of locally ordered property of PDF documents and layout style of books to improve traditional page recognition results. In addition, we employ the optimal matching of Bipartite Graph technology to detect paragraphs' reading order. Experiments show that our methods achieve high accuracy. It is noteworthy that, the research has been applied in a commercial software package for Chinese E-book production.
机译:在手持设备上阅读电子书时,有时应对内容进行重排和重新编排以适应小屏幕移动设备。根据人们的阅读实践,合理地基于段落重排文本内容是合理的。因此,本文解决了这一要求,并提出了一套新颖的PDF电子书段落识别方法。所提出的方法包括三个步骤,即物理结构分析,段落分段和阅读顺序检测。我们利用PDF文档的本地排序属性和书籍的布局样式来改善传统的页面识别结果。此外,我们采用了二部图技术的最佳匹配来检测段落的阅读顺序。实验表明,我们的方法具有很高的准确性。值得注意的是,这项研究已经应用于中文电子书生产的商业软件包中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号