A novel ensemble vision based deep web data extraction technique for web mining applications

机译：一种新颖的基于集合视觉的深度Web数据挖掘技术，用于Web挖掘应用

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web Content extraction is the task of extracting structured information from unstructured and semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images and audio, video could be seen as information extraction. Similarly, information retrieval is the process which is based on user's query. The retrieved information is to be extracted using the web content extraction concept. The Challenges for this type of web page content extraction is increasing now-a-days. In this work, we study the problem of automatically extracting the contents from the web pages. Many more researches have been done to address this problem. The existing approaches have some limitations such as that, it has no sufficient power to deal with the large number of web pages and also that they are web-page-programming- language(HTML) dependent. Our proposed work is to overcome the limitations of the existing system. This work deals with information retrieval process in which the Vision based approach is applied, which helps to extract both images and text from the web pages. In fact most of researches show that when a page is presented to the user, the spatial and visual features play a very important role because they help the user to unconsciously divide the webpage into several semantic parts. Hence, proposed work focus on the primary visual features of a web page. The extraction is carried out on the basis of these features. This approach can gain a better performance when compared with other traditional methods.

机译：Web内容提取是从非结构化和半结构化的机器可读文档中提取结构化信息的任务。在大多数情况下，此活动涉及通过自然语言处理（NLP）处理人类语言文本。多媒体文档处理中的最新活动，例如从图像和音频，视频中自动注释和内容提取，可以看作是信息提取。同样，信息检索是基于用户查询的过程。将使用Web内容提取概念来提取检索到的信息。如今，这种类型的网页内容提取面临的挑战日益增加。在这项工作中，我们研究了自动从网页中提取内容的问题。为了解决这个问题，已经进行了更多的研究。现有方法具有一些局限性，例如，它没有足够的能力来处理大量网页，并且它们依赖于网页编程语言（HTML）。我们提出的工作是要克服现有系统的局限性。这项工作涉及应用基于视觉的方法的信息检索过程，该过程有助于从网页中提取图像和文本。实际上，大多数研究表明，当页面呈现给用户时，空间和视觉功能起着非常重要的作用，因为它们可以帮助用户无意识地将网页分为几个语义部分。因此，建议的工作集中在网页的主要视觉特征上。根据这些特征进行提取。与其他传统方法相比，此方法可以获得更好的性能。

著录项

来源
《2012 IEEE International Conference on Advanced Communication, Control and Computing Technologies.》|2012年|p.110- 114|共5页
会议地点 Ramanathapuram(IN)
作者
Aysha Banu B.; Chitra M.;
展开▼
作者单位

Department of Computer Science and Engineering Mohamed Sathak Engineering College, Kilakarai, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类无线通信;
关键词

相似文献

外文文献
中文文献
专利

1. Web mining and privacy concerns: Some important legal issues to be consider before applying any data and information extraction technique in web-based environments [J] . Juan D. Velasquez Expert Systems with Application . 2013,第13期

机译：Web挖掘和隐私问题：在基于Web的环境中应用任何数据和信息提取技术之前，需要考虑一些重要的法律问题
2. DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining [J] . Aysha Banu andM. Chitra Journal of Emerging Technologies in Web Intelligence . 2014,第1期

机译：DWDE-IR：一种有效的深度Web数据提取，用于Web挖掘中的信息检索
3. ViDE: A Vision-Based Approach for Deep Web Data Extraction [J] . Liu Wei, Meng Xiaofeng, Meng Weiyi Knowledge and Data Engineering, IEEE Transactions on . 2010,第3期

机译：ViDE：一种基于视觉的深度Web数据提取方法
4. A novel ensemble vision based deep web data extraction technique for web mining applications [C] . Aysha Banu B., Chitra M. IEEE International Conference on Advanced Communication Control and Computing Technologies . 2012

机译：基于新型的Web挖掘应用的基于Beep Web数据提取技术
5. SensorWebIDS: A sensor with misuse and anomaly based data mining technique for web intrusion detection [D] . Dong, Jingyu 2006

机译：SensorWebIDS：具有基于滥用和异常的数据挖掘技术的传感器，用于Web入侵检测
6. SBMLmod: a Python-based web application and web service for efficient data integration and model simulation [O] . Sascha Schäuble, Anne-Kristin Stavrum, Mathias Bockwoldt, 2017

机译：SBMLmod：基于Python的Web应用程序和Web服务用于高效的数据集成和模型仿真
7. Transforming user data into user value by novel mining techniques for extraction of web content, structure and usage patterns. The Development and Evaluation of New Web Mining Methods that enhance Information Retrieval and improve the Understanding of User¿s Web Behavior in Websites and Social Blogs. [O] . Ammari Ahmad N. 2010

机译：通过新颖的挖掘技术将用户数据转化为用户价值，以提取Web内容，结构和使用模式。新的Web挖掘方法的开发和评估，该方法可增强信息检索和增进对网站和社交博客中用户Web行为的理解。

A novel ensemble vision based deep web data extraction technique for web mining applications

摘要

著录项

相似文献

相关主题

期刊订阅