Incremental Web Page Template Detection by Text Segments

机译：通过文本段的增量网页模板检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Template detection technique is important for many applications. Most template detection methods utilize content repetition as a hint to detect template blocks that lots of web pages are required as input. So they usually process web pages in batches that a newly crawled page can not be processed until enough pages are collected. This consumes large storage consumption to cache web pages and results in a huge delay in data refreshing. In this paper, we present an incremental framework to detect templates in which a page is processed as soon as it has been crawled. Under this framework, we don't need to cache any web page. Experiments show that our framework consumes less than 7% storage than traditional methods. And also the delay of data refreshing induced by the batch process is completely eliminated.

机译：模板检测技术对于许多应用是重要的。大多数模板检测方法利用内容重复作为一个提示，以检测模板块，即许多网页是输入的输入。因此，它们通常会批处理网页，即在收集足够的页面之前无法处理新爬网页面。这消耗了大量存储消耗来缓存网页并导致数据刷新的巨大延迟。在本文中，我们介绍了一个增量框架，以检测一下在爬网后处理页面的模板。在此框架下，我们不需要缓存任何网页。实验表明，我们的框架比传统方法消耗不到7％的存储。并且还完全消除了批处理过程引起的数据刷新的延迟。

著录项

来源
《IEEE International Workshop on Semantic Computing and Systems》|2008年||共7页
会议地点
作者
Yu Wang; Bingxing Fang; Xueqi Cheng; Li Guo; Hongbo Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词

相似文献

外文文献
中文文献
专利

1. Template Extraction from Heterogeneous Web Pages Using Text Clustering [J] . T.L.N.Divya, G.Loshma, Dr. Nagaratna P Hegde International Journal of Computer Trends and Technology . 2012,第3期

机译：使用文本聚类从异构网页中提取模板
2. TEXT: Automatic Template Extraction from Heterogeneous Web Pages [J] . Kim ChulyunShim Kyuseok Knowledge and Data Engineering, IEEE Transactions on . 2011,第4期

机译：文本：从异构网页中自动提取模板
3. Topical Web Crawling Using Weighted Anchor Text and Web Page Change Detection Techniques [J] . DIVAKAR YADAV, AK SHARMA, JP GUPTA WSEAS Transactions on Information Science and Applications . 2009,第1a3期

机译：使用加权锚文本和网页更改检测技术进行主题网页爬网
4. Incremental Web Page Template Detection by Text Segments [C] . Yu Wang, Bingxing Fang, Xueqi Cheng, IEEE International Workshop on Semantic Computing and Systems . 2008

机译：通过文本段的增量网页模板检测
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. Nodule Detection in a Lung Region thats Segmented with Using Genetic Cellular Neural Networks and 3D Template Matching with Fuzzy Rule Based Thresholding [O] . Serhat Ozekes, Onur Osman, Osman N. Ucan 2008

机译：使用遗传细胞神经网络和3D模板匹配与基于模糊规则的阈值分割对肺区域中的结节检测
7. Incremental web page template detection [O] . Yu Wang, Bingxing Fang, Xueqi Cheng, 2008

机译：增量网页模板检测

Incremental Web Page Template Detection by Text Segments

摘要

著录项

相似文献

相关主题

期刊订阅