Automatic Web Content Extraction for Generating Tag Clouds from Thai Web Sites

机译：从泰国网站生成标签云的自动Web内容提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a novel Web content extraction approach based on heuristic rules and the XPath utility in XML. The main objective is to address the problem of Web visualization by generating tag clouds from Thai Web sites in order to provide an overview of the key words in the Web pages. This paper also proposes a detailed method to assess the Web content extraction technique on a single Web page by using the length of the extracted content. There are three main steps in the proposed technique: Web page elements and features extraction, Block detection, and Content extraction selection. The empirical results have shown this technique produces high accuracies.

机译：本文提出了一种新的基于启发式规则和XML中的XPath实用程序的Web内容提取方法。主要目的是通过从泰国网站生成标签云来解决Web可视化问题，以便概述Web页中的关键字。本文还提出了一种利用提取的内容的长度来评估单个网页上的Web内容提取技术的详细方法。所提出的技术包括三个主要步骤：网页元素和功能提取，块检测和内容提取选择。实验结果表明，该技术具有很高的准确性。

著录项

来源
《8th IEEE International Conference on e-Business Engineering》|2011年|p.85-89|共5页
会议地点
作者
Thanadechteemapat Wigrai; Fung Chun Che;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;电子贸易、网上贸易;
关键词
Tag clouds; Web Content Extraction; XPath;

机译：标签云; Web内容提取; XPath;

相似文献

外文文献
中文文献
专利

1. Automatic Data Extraction from Websites for Generating Aquatic Product Market Information [J] . YUAN Hong-chun, CHEN Ying, SUN Yue-fu Journal of Dong Hua University . 2006,第6期

机译：从网站自动提取数据以生成水产品市场信息
2. A music information system automatically generated via Web content mining techniques [J] . Markus Schedl, Gerhard Widmer, Peter Knees, Information Processing & Management . 2011,第3期

机译：通过Web内容挖掘技术自动生成的音乐信息系统
3. Generate Web Sites Automatically [J] . ROGER JENNINGS Visual Studio Magazine . 2008,第8期

机译：自动生成网站
4. Automatic Web Content Extraction for Generating Tag Clouds from Thai Web Sites [C] . Thanadechteemapat Wigrai, Fung Chun Che IEEE International Conference on e-Business Engineering . 2011

机译：用于从泰国网站生成标签云的自动Web内容提取
5. An evaluation of the quality, readability and Canadian content of Canadian Web sites providing female urinary incontinence information and a brief examination of Web site interactivity. [D] . Farrell, Karen D. 2005

机译：提供女性尿失禁信息的加拿大网站的质量，可读性和加拿大内容的评估，以及网站交互性的简要检查。
6. A modified niche model for generating food webs with stage‐structured consumers: The stabilizing effects of life‐history stages on complex food webs [O] . Etsuko Nonaka, Anna Kuparinen 2021

机译：一种改进的利基模型用于产生阶段结构消费者的食品网：历史阶段对复杂食品网的稳定效果
7. Automatic web content extraction for generating tag clouds from Thai web sites [O] . Thanadechteemapat W., Fung C.C. 2011

机译：自动提取Web内容以从泰国网站生成标签云

Automatic Web Content Extraction for Generating Tag Clouds from Thai Web Sites

摘要

著录项

相似文献

相关主题

期刊订阅