DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH

Coskun Bayrak; Hayrettin Kolukisaoglu; Steve Sieloff

首页> 外文期刊>Journal of integrated design & process science >DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH

【24h】

DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH

机译：从网络存储库中提取数据：一种半自动方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The World Wide Web (WWW) is becoming the most important source of information for business intelligence and information dissemination. Past information gathering techniques like surfing and sifting are proving insufficient in processing the vast volumes of data readily available from the Web. In addition, companies are being forced to integrate this vast data repository within specific cost, time, and reliability spectrums. This paper presents the fundamentals of a system called "Browser Harness" (B2H) that extracts the requested data from Web sites in a supervised fashion. The algorithmic background of this system is based on the tag structure of web pages, as HTML is the predominate choice for rendering web page content on the WWW. B2H is an interactive tool for harnessing data from semi-structured and structured web pages by analyzing the tag structure of the input page and locating the data in the HTML code. The extracted data is then exported to XML, delimited text, or database tables.

机译：万维网（WWW）正在成为用于商业智能和信息分发的最重要的信息源。事实证明，过去的信息收集技术（例如冲浪和筛选）不足以处理大量易于从Web获得的数据。此外，公司被迫在特定的成本，时间和可靠性范围内集成这个庞大的数据存储库。本文介绍了称为“浏览器安全带”（B2H）的系统的基础，该系统以监督方式从网站中提取请求的数据。该系统的算法背景基于网页的标签结构，因为HTML是在WWW上呈现网页内容的主要选择。 B2H是一种交互式工具，可通过分析输入页面的标签结构并在HTML代码中定位数据来利用来自半结构化和结构化网页的数据。然后将提取的数据导出到XML，定界文本或数据库表。

著录项

来源
《Journal of integrated design & process science》 |2003年第4期|共11页
作者
Coskun Bayrak; Hayrettin Kolukisaoglu; Steve Sieloff;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程设计;
关键词
Data extraction; Web mining;

机译：数据提取;Web挖掘;

相似文献

外文文献
中文文献
专利

1. DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH [J] . Coskun Bayrak, Hayrettin Kolukisaoglu, Steve Sieloff Journal of integrated design & process science . 2003,第4期

机译：从网络存储库中提取数据：一种半自动方法
2. Data scale as cartography: a semi-automatic approach for thematic web map creation [J] . Cartography and geographic information science . 2020,第2期

机译：作为制图法的数据规模：主题网络地图创建的半自动方法
3. A Semi-Automatic Approach for the Extraction of Sandy Bodies (Sand Spits) From IKONOS-2 Data [J] . Teodoro A. C. Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2012,第2期

机译：从IKONOS-2数据中提取沙体（沙粒）的半自动方法
4. DATA EXTRACTION FROM REPOSITORIES ON THE WEB:A SEMI-AUTOMATIC APPROACH [C] . Co?kun Bayrak, Hayrettin Koluk?sao?lu, Steve Sieloff Integrated Design amp; Process Technology vol.1(IDPT-Vol.1, 2005) . 2005

机译：从网络存储库中提取数据：一种半自动方法
5. Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories [D] . Thonggoom, Ornsiri 2011

机译：使用实体和关系实例存储库的半自动概念数据建模
6. AHCODA-DB: a data repository with web-based mining tools for the analysis of automated high-content mouse phenomics data [O] . Bastijn Koopmans, August B. Smit, Matthijs Verhage, 2017

机译：AHCODA-DB：带有基于Web的挖掘工具的数据库用于分析自动化的高含量鼠标特征数据
7. Data Extraction and Annotation for Web Databases using Multiple Annotators Approach - A Review [O] . Yogesh W.Wanjari, Dipali B. Gaikwad, Vivek D. Mohod, 2014

机译：使用多个注释器方法的Web数据库的数据提取和注释 - 评论

DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH

摘要

著录项

相似文献

相关主题

期刊订阅