首页> 外文会议>Electronic Imaging Science and Technology Symposium >Using definite clause grammars to build a global system for analyzing collections of documents
【24h】

Using definite clause grammars to build a global system for analyzing collections of documents

机译:使用明确的子段语法来构建一个分析文件集合的全球系统

获取原文

摘要

Collections of documents are sets of heterogeneous documents, like a specific ancient book series, having proper structural and semantic properties linking them. A particular collection contains document images with specific physical layouts, like text pages or full-page illustrations, appearing in a specific order. Its contents, like journal articles, may be shared by several pages, not necessary following, producing strong dependencies between pages interpretations. In order to build an analysis system which can bring contextual information from the collect ion to the appropriate recognition modules for each page, we propose to express the structural and the semantic properties of a collection with a definite clause grammar. This is made possible by representing collections as streams of document images, and by using extensions to the formalism we present here. We are then able to automatically generate a parser dedicated to a collection. Beside allowing structural variations and complex information flows, we also show that this approach enables the design of analysis stages, on a document or a set of documents. The interest of context usage is illustrated with several examples and their appropriate formalization in this framework.
机译:文件集合是异构文件的集合,如特定的古老书系列,具有适当的结构和语义属性。特定集合包含具有特定物理布局的文档图像,如文本页面或全页图示,以特定顺序出现。它的内容,如日记文章,可以由几页共享,而不是必要的,在后面产生强的依赖性解释。为了构建可以将来自收集离子的上下文信息带到每个页面的适当识别模块的分析系统,我们建议表达具有明确条款语法的集合的结构和语义属性。通过将集合表示为文档图像的流,并且通过使用我们在此呈现的形式主义中的延伸来实现这一点。然后,我们可以自动生成专用于集合的解析器。除了允许结构性变化和复杂的信息流程之外,我们还表明这种方法能够在文档或一组文档上设计分析阶段。上下文使用的兴趣在此框架中具有若干示例及其适当的形式化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号