Bootstrapping Semantic Annotation for Content-Rich HTML Documents

机译：对内容丰富的HTML文档引导语义注释

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Enormous amount of semantic data is still being encoded in HTML documents. Identifying and annotating the semantic concepts implicit in such documents makes them directly amenable for Semantic Web processing. In this paper we describe a highly automated technique for annotating HTML documents, especially template-based content-rich documents, containing many different semantic concepts per document. Starting with a (small) seed of hand-labeled instances of semantic concepts in a set of HTML documents we bootstrap an annotation process that automatically identifies unlabeled concept instances present in other documents. The bootstrapping technique exploits the observation that semantically related items in content-rich documents exhibit consistency in presentation style and spatial locality to learn a statistical model for accurately identifying different semantic concepts in HTML documents drawn from a variety ofWeb sources. We also present experimental results on the effectiveness of the technique.

机译：巨大数量的语义数据仍在HTML文档中编码。在此类文档中识别和注释隐式的语义概念使其直接适用于语义Web处理。在本文中，我们描述了一种高度自动化的技术，用于注释HTML文档，尤其是基于模板的内容丰富的文档，包含每个文档许多不同的语义概念。从一组HTML文档中的一个（小）种子的语义概念的种子，我们引导了一个注释过程，它自动识别其他文档中存在的未标记的概念实例。引导技术利用观察到内容丰富的文档中的语义相关项目表现出呈现风格和空间局部的一致性，以学习用于准确识别从各种WEB源绘制的HTML文档中的不同语义概念的统计模型。我们还提出了对技术的有效性的实验结果。

著录项

来源
《International Conference on Data Engineering》|2005年||共11页
会议地点
作者
Mukherjee S.; Ramakrishnan I.V.; Singh A.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274-53;
关键词

相似文献

外文文献
中文文献
专利

1. SEMANTIC ANNOTATION OF WIKI USING WIKI MARKUP FOR HTML5 MICRODATA [J] . VIGNESH NANDHA KUMAR K R, PANDURANGAN N, VIJAYAKUMAR R, International Journal of Engineering Science and Technology . 2010,第12期

机译：使用HTML5微数据的WIKI标记对WIKI进行语义标注
2. A Semantic Based Approach for Information Retrieval from Html Documents Using Wrapper Induction Technique [J] . A.M.Abirami, A.Askarunisa, T.M.Aishwarya, Computer Science & Information Technology . 2013,第6期

机译：基于语义的Html文档信息检索方法
3. Framework of Semantic Annotation of Arabic Document using Deep Learning [J] . Saeed Albukhitan, Ahmed Alnazer, Tarek Helmy Procedia Computer Science . 2020,第5期

机译：使用深度学习的阿拉伯文文献的语义注释框架
4. Bootstrapping Semantic Annotation for Content-Rich HTML Documents [C] . Mukherjee, S., Ramakrishnan, . 2005

机译：内容丰富的HTML文档的自举语义注释
5. Semantic hierarchies of HTML documents and their applications. [D] . Lim, SeungJin. 2001

机译：HTML文档及其应用程序的语义层次结构。
6. Easing semantically enriched information retrieval—An interactive semi-automatic annotation system for medical documents [O] . Theresia Gschwandtner, Katharina Kaiser, Patrick Martini, -1

机译：在语义上富集的信息检索 - 用于医疗文档的交互式半自动注释系统
7. Bootstrapping Semantic Annotation for Content-Rich HTML Documents [O] . Saikat Mukherjee Ramakrishnan, I. V. Ramakrishnan, Amarjeet Singh 2005

机译：内容丰富的HTmL文档的引导语义标注

Bootstrapping Semantic Annotation for Content-Rich HTML Documents

摘要

著录项

相似文献

相关主题

期刊订阅