首页> 外文会议>International semantic web conference >Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis
【24h】

Deployment of RDFa, Microdata, and Microformats on the Web - A Quantitative Analysis

机译:在Web上部署RDFa,微数据和微格式-定量分析

获取原文
获取外文期刊封面目录资料

摘要

More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.
机译:越来越多的网站使用微格式,微数据和RDFa等标记标准将描述产品,评论,博客文章,人员,组织,事件和烹饪食谱的结构化数据嵌入HTML页面中。在过去两年中,随着主要的网络公司(例如Google,Facebook,Yahoo!和Microsoft)开始在其应用程序中使用嵌入式数据,这种发展得到了加速。在本文中,我们分析了RDFa,微数据和微格式在整个Web上的采用。我们的研究基于2012年初的大型公共Web爬网,其中包含30亿个HTML页面,这些页面来自超过4000万个网站。分析揭示了不同标记标准的部署,已发布数据的主要主题区域以及每个主题区域内用来表示数据的不同词汇。我们的工作与大型Web公司发布的早期研究的不同之处在于,分析的爬网以及提取的数据是公开可用的。这使我们的发现得到验证,并可以用作进一步针对特定领域的调查以及针对重点信息提取工作的起点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号