首页> 外文期刊>International Journal on Document Analysis and Recognition >Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms
【24h】

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

机译:检测专利中的图形和零件标签:基于竞争的图形识别算法开发

获取原文
获取原文并翻译 | 示例
           

摘要

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of 'tool-tips' and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.
机译:大多数美国专利商标局(USPTO)专利文件都包含以图形方式描述发明的绘图页。按照惯例和规则,这些图形包含带有编号标签但不带有文本的图形和零件。结果,读者必须扫描文档以找到给定零件标签的描述。为了在自动创建“工具提示”和从零件标签到其相关描述的超链接方面取得进展,USPTO举办了为期一个月的在线竞赛,参与者开发了检测图形和图表零件标签的算法。这项挑战吸引了232个团队,每两个团队,其中70个团队(30%)提交了解决方案。一项不寻常的功能是,每项专利均以300 dpi的页面扫描以及包含专利文本的HTML文件表示,从而可以将文本处理和图形识别集成到参与者算法中。展示了前五名系统的设计和性能以及比赛后开发的系统,这说明获胜的团队在严格的时间和计算约束下产生了近乎最新的结果。第一名系统使用提供的HTML文本,对图形区域检测获得88.57%的谐波查全率和精确度(F-measure),对于具有正确识别图形标题的图形区域,获得78.81%,对于部件标签检测和零件,获得70.98%承认。前五名系统的数据和源代码可通过在线UCI机器学习存储库获得,以支持文档识别社区中其他人员的后续工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号