Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

C. I. Ezeife

首页> 外文期刊>International Journal of Data Warehousing and Mining >Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

【24h】

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

机译：使用NFA进行Web文档对象的比较挖掘：WebOMiner系统

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The process of extracting comparative heterogeneous web content data which are derived and historical from related web pages is still at its infancy and not developed. Discovering potentially useful and previously unknown information or knowledge from web contents such as "list all articles on 'Sequential Pattern Mining'written between 2007 and 2011 including title, authors, volume, abstract, paper, citation, year of publication, " would require finding the schema of web documents from different web pages, performing web content data integration, building their virtual or physical data warehouse before web content extraction and mining from the database. This paper proposes a technique for automatic web content data extraction, the WebOMiner system, which models web sites of a specific domain like Business to Customer (B2C) web sites, as object oriented database schemas. Then, non-deterministic finite state automata (NFA) based wrappers for recognizing content types from this domain are built and used for extraction of related contents from data blocks into an integrated database for future second level mining for deep knowledge discovery.

机译：从相关网页中提取和提取历史数据的比较异构网页内容数据的过程仍处于起步阶段，尚未开发。从Web内容中发现潜在有用的，以前未知的信息或知识，例如“列出2007年至2011年之间撰写的有关'Sequential Pattern Mining'的所有文章，包括标题，作者，卷，摘要，论文，引文，出版年代”，来自不同网页的Web文档的架构，执行Web内容数据集成，在从数据库中提取和挖掘Web内容之前构建其虚拟或物理数据仓库。本文提出了一种用于Web内容自动数据提取的技术，即WebOMiner系统，该系统将特定域的网站（如企业对客户（B2C）网站）建模为面向对象的数据库架构。然后，用于识别此域内容类型的基于非确定性有限状态自动机（NFA）的包装器将被构建，并用于将数据块中的相关内容提取到集成数据库中，以供将来进行第二级挖掘以进行深度知识发现。

著录项

来源
《International Journal of Data Warehousing and Mining》 |2012年第4期|共21页
作者
C. I. Ezeife;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类矿业工程;
关键词
Non-Deterministic Finite Automata (NFA); Object Oriented Mining; Web Content Mining; Web Data Integration; Wrappers;

机译：非确定性有限自动机（NFA）;面向对象的挖掘;Web内容挖掘;Web数据集成;包装器;
入库时间 2022-08-18 10:40:40

相似文献

外文文献
中文文献
专利

1. Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System [J] . C. I. Ezeife International Journal of Data Warehousing and Mining . 2012,第4期

机译：使用NFA进行Web文档对象的比较挖掘：WebOMiner系统
2. Web Video Object Mining: Expectation Maximization and Density Based Clustering of Web Video Metadata Objects [J] . Siddu P. Algur, Prashant Bhat International Journal of Information Engineering and Electronic Business . 2016,第1期

机译：Web视频对象挖掘：基于期望最大化和密度的Web视频元数据对象聚类
3. Object search using object co-occurrence relations derived from web content mining [J] . Chumtong Puwanan, Mae Yasushi, Ohara Kenichi, Intelligent Service Robotics . 2014,第1期

机译：对象搜索使用从Web内容挖掘的对象共同发生关系
4. The Mining and Extraction of Primary Informative Blocks and Data Objects from Systematic Web Pages [C] . Yi-Feng Tseng, Hung-Yu Kao IEEE/WIC/ACM International Conference on Web Intelligence . 2006

机译：从系统网页中提取和提取主要信息块和数据对象
5. Comparative Mining of Multiple Web Data Source Contents with Object Oriented Model. [D] . Alahmad, Yanal. 2013

机译：使用面向对象模型比较挖掘多个Web数据源内容。
6. HOLON: extending Web document libraries via objects in order to support the health information infrastructure. Health Object Library Online. [O] . B. G. Silverman, P. Jones, C. Safran, 1998

机译：HOLON：通过对象扩展Web文档库以支持健康信息基础结构。在线运行状况对象库。
7. WISDOM: Web intrapage informative structure mining based on document object model [O] . Hung-yu Kao, Jan-ming Ho, Ming-syan Chen 2005

机译：WIsDOm：基于文档对象模型的Web页面信息结构挖掘
8. A Note on Interfacing Object Warehouses and Mass Storage Systems for Data Mining Applications [R] . Grossman, Robert L., Northcutt, Dave 1996

机译：关于数据挖掘应用程序的对象仓库和海量存储系统接口的注记

Towards Comparative Mining of Web Document Objects with NFA: WebOMiner System

摘要

著录项

相似文献

相关主题

期刊订阅