首页> 外国专利> Extraction of text and graphics from PDF files for use in browser files such as HTML files with defined anchorable information units or hyperlinks by direct text and graphic extraction rather than use of scanning based approaches

Extraction of text and graphics from PDF files for use in browser files such as HTML files with defined anchorable information units or hyperlinks by direct text and graphic extraction rather than use of scanning based approaches

机译:通过直接文本和图形提取而不是使用基于扫描的方法,从PDF文件中提取文本和图形,以用于浏览器文件(例如具有定义的可定位信息单元或超链接的HTML文件)中

摘要

System for processing multimedia files for presenting information for use in browser suitable multimedia data files. System comprises a content parser for identifying text and graphics within a file, an image processor for processing and identifying graphics content to identify embedded test, a text sorter for parsing the identified text and embedded text to find text elements according to preset sorting rules and a memory for storing browser files containing text elements. Independent claims are made for a method for producing anchorable information units (AUI) from PDF format documents by extracting text segments from PDF documents, determination of segment context, with context chose from a hierarchical structure and definition of text segments as AUIs and a machine readable program storage device containing a program for extracting text etc. from PDF files.
机译:用于处理多媒体文件以呈现信息的系统,该信息供在浏览器中使用的合适的多媒体数据文件。该系统包括:内容解析器,用于标识文件中的文本和图形;图像处理器,用于处理和标识图形内容以标识嵌入式测试;文本分类器,用于根据预设的排序规则来解析所标识的文本和嵌入式文本以查找文本元素;以及用于存储包含文本元素的浏览器文件的内存。对于从PDF格式文档中生成可锚定信息单元(AUI)的方法提出了独立权利要求,该方法是从PDF文档中提取文本段,确定段上下文,并从层次结构中选择上下文,并将文本段定义为AUI和机器可读的。程序存储设备,其包含用于从PDF文件提取文本等的程序。

著录项

  • 公开/公告号DE10162156A1

    专利类型

  • 公开/公告日2002-07-25

    原文格式PDF

  • 申请/专利权人 SIEMENS CORP. RESEARCH INC.;

    申请/专利号DE2001162156

  • 发明设计人 CHAKRABORTY AMIL;HSU LIANG-HUA;

    申请日2001-12-17

  • 分类号G06F17/21;G06T11/60;G06F17/30;G06F3/037;

  • 国家 DE

  • 入库时间 2022-08-22 00:26:50

相似文献

  • 专利
  • 外文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号