AgentMat: Framework for data scraping and semantization

机译：AgentMat：数据抓取和语义化框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the enormous amount of information from the internet is available just like Web pages made for a human reader. They don't have any common interface for accessing, searching or browsing the data. Hence, it's hard to extract the semantic data from the Web, categorize them and keep them updated. For this purpose we have designed and implemented a system called AgentMat. This system is designed for efficient extraction of large amount of data from the Web pages. AgentMat processing is based on an XML-based language describing the given extraction task in a declarative way. The task description consists of system components, which connected together are able to perform the desired functionality on a general Web page. Thanks to this scraping system the raw contents from the irregularly updated and unstructured Web pages can be kept categorized and accessed together with the semantic metadata. In our pilot implementation we have built the MediaPub system, which extracts the information from various Web pages, does automatic categorizing and checks for duplicities.

机译：互联网上的大量信息中的大多数都可以像为人类读者制作的网页一样获得。它们没有用于访问，搜索或浏览数据的通用接口。因此，很难从Web提取语义数据，对其进行分类并保持更新。为此，我们设计并实现了一个名为AgentMat的系统。该系统设计用于从网页中高效提取大量数据。 AgentMat处理基于基于XML的语言，该语言以声明的方式描述给定的提取任务。任务说明由系统组件组成，这些组件连接在一起，能够在常规Web页面上执行所需的功能。由于采用了这种抓取系统，可以将来自不定期更新和非结构化网页的原始内容与语义元数据一起进行分类和访问。在我们的试验性实施中，我们构建了MediaPub系统，该系统从各种Web页面中提取信息，进行自动分类并检查重复性。

著录项

来源
《Research Challenges in Information Science, 2009. RCIS 2009》|2009年|225-236|共12页
会议地点 Fez(MA);Fez(MA)
作者
Beno M.; Misek J.; Zavoral F.;
展开▼
作者单位

Dept. of Software Eng., Charles Univ. in Prague, Prague;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
XML; information retrieval systems; meta data; semantic Web; software agents; AgentMat processing; Internet; MediaPub system; Web pages; World Wide Web; XML-based language; data scraping; information extraction; semantic metadata; semantization; system components; task description; categorizing; image duplicity check; multimedia database; web scraping;

机译：XML;信息检索系统;元数据;语义Web;软件代理; AgentMat处理; Internet; MediaPub系统; Web页;万维网;基于XML的语言;数据抓取;信息提取;语义元数据;语义化;系统组件;任务描述;分类;图像重复性检查;多媒体数据库;网页抓取;

相似文献

外文文献
中文文献
专利

1. A FRAMEWORK FOR ARCHITECTURAL HERITAGE HBIM SEMANTIZATION AND DEVELOPMENT [J] . Brusaporci S., Maiezza P., Tata A. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences . 2018,第4期

机译：建筑遗产HBIM语义化和开发框架
2. A Survey of Data Semantization in Internet of Things [J] . Feifei Shi, Qingjuan Li, Tao Zhu, Sensors . 2018,第1期

机译：物联网中的数据语义化研究
3. What You Can Scrape and What Is Right to Scrape: A Proposal for a Tool to Collect Public Facebook Data [J] . Moreno Mancosu, Federico Vegetti Social Media + Society . 2020,第3期

机译：您可以刮擦的是什么是刮刮的权利：用于收集公共Facebook数据的工具的提案
4. AgentMat: Framework for Data Scraping and Semantization [C] . Miloslav Beno, Jakub Misek, Filip Zavoral International Conference on Research Challenges in Information Science . 2009

机译：Agentmat：数据擦除和语义的框架
5. Scraped data and prices in macroeconomics. [D] . Cavallo, Alberto F. 2010

机译：宏观经济学中报废的数据和价格。
6. A Survey of Data Semantization in Internet of Things [O] . Feifei Shi, Qingjuan Li, Tao Zhu, 2018

机译：物联网中的数据语义化研究
7. A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping [O] . Fernández Villamor José Ignacio, Blasco Garcia Jacobo, Iglesias Fernandez Carlos Angel, 2011

机译：Web资源的语义抓取模型-将链接数据应用于网页屏幕抓取

AgentMat: Framework for data scraping and semantization

摘要

著录项

相似文献

相关主题

期刊订阅