首页> 外文会议>World Multiconference on Systemics, Cybernetics and Informatics >Automated Document Labeling for Web-based Online Medical Journals
【24h】

Automated Document Labeling for Web-based Online Medical Journals

机译:基于Web的在线医学期刊的自动文件标记

获取原文

摘要

An increasing number of publishers are using the Internet and the World Wide Web to provide their subscribers with access to online journals. New techniques are needed to capture, classify, analyze, extract, modify, and reformat Web-based document information for computer storage, access, and processing. An R&D division of the National Library of Medicine (NLM) is developing an automated system, temporarily code-named WebMARS for Web-based Medical Article Records System, to download, analyze and extract bibliographic information from Web-based journal articles to produce citation records for its MEDLINE database. This paper describes one component of this system: assigning meaningful labels to text zones containing article title, author names, affiliation, and abstract. This labeling technique is based on features derived from the World Wide Web Consortium Document Object Model (W3C DOM) and an analysis of the page layout for each journal, a DOM-based document node location and content analysis, string pattern matching, and a depth-first node traversal algorithm. Experiments carried out on a variety of Web-based medical journals have proved the feasibility of this automated document labeling approach. Preliminary evaluation results on a small set of Web-based medical journal articles show that the system is capable of labeling text zones at an accuracy of over 95%.
机译:越来越多的出版商正在使用互联网和万维网,为他们的订阅者提供访问在线期刊。需要进行新技术来捕获,分类,分析,提取,修改和重新格式化基于Web的文档信息,用于计算机存储,访问和处理。国家医学图书馆(NLM)的研发部门正在开发自动化系统,临时代码为基于Web的医疗文章记录系统的网格,从基于Web的期刊文章下载,分析和提取书目信息以产生引文记录对于其Medline数据库。本文介绍了该系统的一个组件:将有意义的标签分配给包含文章标题,作者名称,隶属度和抽象的文本区域。该标签技术基于来自万维网联盟文档对象模型(W3C DOM)的功能,以及每个日记的页面布局的分析,基于DOM的文档节点位置和内容分析,字符串模式匹配和深度-dirst节点遍历算法。在各种基于Web的医学期刊上进行的实验证明了这种自动化文件标签方法的可行性。初步评估结果对一小组的基于Web的医学期刊文章表明,该系统能够以超过95%的准确性标记文本区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号