Recognition of HTML Table Structure

机译：识别HTML表结构

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Tables in HTML Web pages have become precious knowledge sources. Therefore it is reasonable and necessary to develop an algorithm to extract knowledge from them. For this, we need a system to identify the boundary between attributes and values of a table in HTML. In this paper, we propose an algorithm for this purpose. The outline of the algorithm is that if we find a row(or column) having low similarity with other rows (or columns), it is probably an attribute name row (or column), otherwise value data rows(or columns). The algorithm based on this idea results in 82% accuracy of recognition of lengthways and 78% accuracy of recognition of sideways for 300 tables in HTML of Web pages downloaded from the Web.

机译：HTML网页中的表已成为珍贵的知识来源。因此，开发一种从中提取知识的算法是合理和必要的。为此，我们需要一个系统来标识HTML中表的属性和值之间的边界。在本文中，我们为此目的提出了一种算法。算法的轮廓是，如果我们找到与其他行（或列）具有低相似性的行（或列），则可能是属性名称行（或列），否则值数据行（或列）。基于此思想的算法导致82％的识别概率识别和78％的识别精度为300张来自Web的网页的HTML中的300个表。

著录项

来源
《Insternational Joint Conference on Natural Language Processing》|2004年||共6页
会议地点
作者
Hidetaka MASUDA; Shuichi TSUKAMOTO; Hiroshi NAKAGAWA; Association for Computational Linguistics(ACL); Association for Computational Linguistics and Chinese Language Processing(ACLCLP); Association of Natural Language Processing(ANLP);
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Extracting logical structures from HTML tables [J] . Yeon-Seok Kim, Kyong-Ho Lee Computer standards & interfaces . 2008,第5期

机译：从HTML表中提取逻辑结构
2. An indexing method for table structures of HTML format [J] . Masami Shishibori, Yoshihiro Iwaguchi, Minsoo Jung, 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第192期

机译：HTML格式表结构的索引方法
3. An indexing method for table structures of HTML format [J] . Masami Shishibori, Yoshihiro Iwaguchi, Minsoo Jung, 電子情報通信学会技術研究報告. デ-タ工学. Data Engineering . 2001,第192期

机译：HTML格式表结构的索引方法
4. Recognition of HTML Table Structure [C] . Hidetaka MASUDA, Shuichi TSUKAMOTO, Hiroshi NAKAGAWA Insternational Joint Conference on Natural Language Processing; 20040322-24; Sanya(CN) . 2004

机译：HTML表结构的识别
5. Designing instructional materials for teaching HTML to create Web page tables: Applying cognitive load theory. [D] . Hogg, Nanette M. 2004

机译：设计用于讲授HTML来创建网页表的教学材料：应用认知负荷理论。
6. CH5M3D: an HTML5 program for creating 3D molecular structures [O] . Clarke W Earley 2013

机译：CH5M3D：用于创建3D分子结构的HTML5程序
7. Automating the extraction of data from HTML tables with unknown structure [O] . David W. Embley, Cui Tao, Stephen W. Liddle 2005

机译：自动从结构未知的HTmL表中提取数据

Recognition of HTML Table Structure

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅