首页> 外文期刊>Library hi tech >Heuristics for identification of bibliographic elements from verso of title pages
【24h】

Heuristics for identification of bibliographic elements from verso of title pages

机译:从标题页的反面识别书目元素的启发式方法

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a methodology to capture bibliographic data from the verso of the title pages of documents. A survey has been undertaken to identify the syntactic and semantic features of bibliographic elements on the verso of title pages. These features include the font size, line numbers and appearence of certain string of characters. Emphasis is given to the study of "cataloguing-in-publication" data. The results of the survey are used to develop heuristics which can help in developing a program to automatically identify the various bibliogaphic data elements. The back of the title pages are scanned and stored as HTML pages using optical recognition software. The heuristics are then applied on the HTML pages. Few samples of input and the output generated are presented. Finally, the problems related to OCR and the heuristics are enumerated.
机译:本文提出了一种从文档标题页的反面捕获书目数据的方法。已经进行了一项调查,以识别书目页面反面上的书目元素的句法和语义特征。这些功能包括字体大小,行号和某些字符串的出现。着重研究“公开目录”数据。调查的结果用于开发启发式方法,可以帮助开发程序来自动识别各种书目数据元素。使用光学识别软件扫描标题页的背面并将其存储为HTML页。然后将试探法应用于HTML页面。很少显示输入和生成的样本。最后,列举了与OCR和启发式算法有关的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号