Applying Pattern Mining to Web Information Extraction

机译：将模式挖掘应用于Web信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper induction aim to solve this problem by applying machine learning to automatically generate extractors. For example, WIEN, Stalker, Softmealy, etc. However, this approach still requires human intervention to provide training examples. In this paper, we propose a novel idea to IE, by repeated pattern mining and multiple pattern alignment. The discovery of repeated patterns are realized through a data structure call PAT tree. In addition, incomplete patterns are further revised by pattern alignment to comprehend all pattern instances. This new track to IE involves no human effort and content-dependent heuristics. Experimental results show that the constructed extraction rules can achieves 97 percent extraction over fourteen popular search engines.

机译：来自半结构化Web文档的信息提取（即）是Internet上信息集成系统的关键问题。以前的包装诱导的工作旨在通过应用机器学习来自动生成提取器来解决这个问题。例如，Wien，Stalker，Softmealy等，这种方法仍然需要人为干预以提供培训示例。在本文中，通过重复的模式挖掘和多种模式对准，我们向IE提出了一种新颖的想法。通过数据结构调用PAT树实现重复模式的发现。此外，图案对齐还通过模式对齐进行了不完整的模式，以了解所有模式实例。这种新曲目到IE涉及没有人类的努力和依赖内容的启发式。实验结果表明，建造的提取规则可实现97％的百分比上四百种流行的搜索引擎。

著录项

来源
《Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining》|2002年||共12页
会议地点
作者
Chia-Hui Chang; Shao-Chen Lui; Yen-Chin Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词

相似文献

外文文献
中文文献
专利

1. Discovering Frequent Patterns and Trends by Applying Web Mining Technology in Web Log Data [J] . C. Umapathi, J. Raja International journal of soft computing . 2008,第2期

机译：通过在Web日志数据中应用Web挖掘技术发现频繁的模式和趋势
2. Web mining and privacy concerns: Some important legal issues to be consider before applying any data and information extraction technique in web-based environments [J] . Juan D. Velasquez Expert Systems with Application . 2013,第13期

机译：Web挖掘和隐私问题：在基于Web的环境中应用任何数据和信息提取技术之前，需要考虑一些重要的法律问题
3. Web Usage Mining: User Navigational Patterns Extraction from Web Logs [J] . Thilaga Rani, Mrs. S.Vydehi International Journal of Computer Trends and Technology . 2012,第4期

机译：Web使用挖掘：从Web日志中提取用户导航模式
4. Applying Pattern Mining to Web Information Extraction [C] . Chia-Hui Chang, Shao-Chen Lui, Yen-Chin Wu Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining . 2002

机译：将模式挖掘应用于Web信息提取
5. Combined mining of Web server logs and Web contents for classifying user navigation patterns and predicting users' future requests. [D] . Liu, Haibin. 2005

机译：结合挖掘Web服务器日志和Web内容，以对用户导航模式进行分类并预测用户的未来请求。
6. Applying sequential pattern mining to investigate cerebrovascular health outpatients’ re-visit patterns [O] . Chao Ou-Yang, Chandrawati Putri Wulandari, Rizka Aisha Rahmi Hariadi, -1

机译：应用顺序模式挖掘来调查脑血管健康门诊病人的重访模式
7. Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs. [O] . Ammari Ahmad N. 2010

机译：通过新颖的挖掘技术将用户数据转化为用户价值，以提取Web内容，结构和使用模式。新的Web挖掘方法的开发和评估，该方法可增强信息检索和增进对网站和社交博客中用户Web行为的理解。

Applying Pattern Mining to Web Information Extraction

摘要

著录项

相似文献

相关主题

期刊订阅