首页> 外文会议>Integrated Design amp; Process Technology vol.1(IDPT-Vol.1, 2005) >DATA EXTRACTION FROM REPOSITORIES ON THE WEB:A SEMI-AUTOMATIC APPROACH

【24h】

DATA EXTRACTION FROM REPOSITORIES ON THE WEB:A SEMI-AUTOMATIC APPROACH

机译：从网络存储库中提取数据：一种半自动方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The World Wide Web (WWW) is becoming the most important source of information for businessrnintelligence and information dissemination. Past information gathering techniques like surfing andrnsifting are proving insufficient in processing the vast volumes of data readily available from thernWeb. In addition, companies are being forced to integrate this vast data repository within specificrncost, time, and reliability spectrums. This paper presents the fundamentals of a system calledrn“Browser Harness” (B2H) that extracts the requested data from Web sites in a supervised fashion.rnThe algorithmic background of this system is based on the tag structure of web pages, as HTML isrnthe predominate choice for rendering web page content on the WWW. B2H is an interactive toolrnfor harnessing data from semi-structured and structured web pages by analyzing the tag structurernof the input page and locating the data in the HTML code. The extracted data is then exported tornXML, delimited text, or database tables.

机译：万维网（WWW）成为商务智能和信息传播的最重要信息来源。事实证明，过去的信息收集技术（如冲浪和筛选）不足以处理可从TherWeb轻松获得的大量数据。此外，公司被迫在特定的成本，时间和可靠性范围内集成这个庞大的数据存储库。本文介绍了称为“浏览器安全带”（B2H）的系统的基本原理，该系统以监督方式从网站中提取请求的数据。该系统的算法背景基于网页的标签结构，因为HTML是主要的选择用于在WWW上呈现网页内容。 B2H是一种交互式工具，可通过分析输入页面上的标签结构并在HTML代码中定位数据来利用半结构化和结构化网页中的数据。然后将提取的数据导出到XML，分隔文本或数据库表中。

著录项

来源
《Integrated Design amp; Process Technology vol.1(IDPT-Vol.1, 2005) 》|2005年|13-23|共11页
会议地点 Beijing(CN)
作者
Co?kun Bayrak; Hayrettin Koluk?sao?lu; Steve Sieloff;
展开▼
作者单位

Computer Science Department, University of Arkansas at Little Rock, Little Rock, AR, U.S.A.;

Computer Science Department, University of Arkansas at Little Rock, Little Rock, AR, U.S.A.;

Acxiom Corporation, Little Rock, AR, U.S.A.;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Data extraction; web mining;

机译：数据提取；网络挖掘;

相似文献

外文文献
中文文献
专利

1. DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH [J] . Coskun Bayrak, Hayrettin Kolukisaoglu, Steve Sieloff Journal of integrated design & process science . 2003 ,第4期

机译：从网络存储库中提取数据：一种半自动方法
2. DATA EXTRACTION FROM REPOSITORIES ON THE WEB: A SEMI-AUTOMATIC APPROACH [J] . Coskun Bayrak, Hayrettin Kolukisaoglu, Steve Sieloff Journal of integrated design & process science . 2003 ,第4期

机译：从网络存储库中提取数据：一种半自动方法
3. Data scale as cartography: a semi-automatic approach for thematic web map creation [J] . Cartography and geographic information science . 2020 ,第2期

机译：作为制图法的数据规模：主题网络地图创建的半自动方法
4. A Preliminary Investigation of a Semi-Automatic Criminology Intelligence Extraction Method: A Big Data Approach [C] . Trovati Marcello, Hodgsons Philip, Hargreaves Charlotte International Conference on Intelligent Networking and Collaborative Systems . 2015

机译：半自动犯罪学情报提取方法的初步研究：大数据方法
5. Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories [D] . Thonggoom, Ornsiri 2011

机译：使用实体和关系实例存储库的半自动概念数据建模
6. AHCODA-DB: a data repository with web-based mining tools for the analysis of automated high-content mouse phenomics data [O] . Bastijn Koopmans, August B. Smit, Matthijs Verhage, 2017

机译：AHCODA-DB：带有基于Web的挖掘工具的数据库用于分析自动化的高含量鼠标特征数据
7. Libraries' Metadata as Data in the Era of the Semantic Web: Modeling a Repository of Master Theses and PhD Dissertations for the Web of Data [O] . Peponakis, Manolis 2016

机译：图书馆的元数据作为语义网时代的数据：建模数据网的硕士论文和博士论文库

DATA EXTRACTION FROM REPOSITORIES ON THE WEB:A SEMI-AUTOMATIC APPROACH

摘要

著录项

相似文献

相关主题

期刊订阅