Leveraging Web 2.0 Sources for Web Content Classification

机译：利用Web 2.0源以获取Web内容分类

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper addresses practical aspects of web page classification not captured by the classical text mining framework. Classifiers are supposed to perform well on a broad variety of pages. We argue that constructing training corpora is a bottleneck for building such classifiers, and that care has to be taken if the goal is to generalize to previously unseen kinds of pages on the web. We study techniques for building training corpora automatically from publicly available web resources, quantify the discrepancy between them, and demonstrate that encouraging agreement between classifiers given such diverse sources drastically outperforms methods that ignore the different natures of data sources on the web.

机译：本文涉及经典文本挖掘框架未捕获的网页分类的实际方面。分类器应该在广泛的页面上表现良好。我们认为构建培训Corpora是建立此类分类器的瓶颈，如果目标是概括到以前的网上看不见的页面，则必须采取。我们将自动从公开的Web资源中建立培训技术的技术，量化它们之间的差异，并证明了对分类器之间的促进同意，因为这种不同的来源急剧优于忽略网络上数据源的不同自然的方法。

著录项

来源
《IEEE/WIC/ACM Joint International Conference on Web Intelligence and Intelligent Agent Technology》|2008年||共7页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
corpus construction; text mining; web 2.0; web classification;

机译：语料库建设;文本挖掘;Web 2.0;Web分类;

相似文献

外文文献
中文文献
专利

1. Leveraging Web 2.0 technologies to add value to the IUPAC Solubility Data Series: development of a REST style website and application programming interface (API) [J] . Chalk Stuart J. Pure and Applied Chemistry . 2015,第11a12期

机译：利用Web 2.0技术为IUPAC溶解度数据系列增值：开发REST风格的网站和应用程序编程接口（API）
2. Web 2.0 Proxy: Upgrading Websites from Web 1.0 to Web 2.0 [J] . Yung-Wei Kao, Ming-Chih Hsieh, Sheau-Ling Hsieh, WSEAS Transactions on Communications . 2008,第4a6期

机译：Web 2.0代理：将网站从Web 1.0升级到Web 2.0
3. A Comprehensive Analysis of Academic Library Websites: Design, Navigation, Content, Services, and Web 2.0 Tools [J] . Charlene L Al-Qallaf, Alaa Ridha The international information & library review . 2019,第2期

机译：学术图书馆网站综合分析：设计，导航，内容，服务和Web 2.0工具
4. Leveraging Web 2.0 Sources for Web Content Classification [C] . IEEE/WIC/ACM Joint International Conference on Web Intelligence and Intelligent Agent Technology . 2008

机译：利用Web 2.0源以获取Web内容分类
5. Leveraging open source web resources to improve retrieval of low text content items. [D] . Singhal, Ayush. 2014

机译：利用开源Web资源来改善对低文本内容项的检索。
6. Teaching Web 2.0 technologies using Web 2.0 technologies [O] . Melissa L. Rethlefsen, Mary Piorun, J. Dale Prince 2009

机译：使用Web 2.0技术教授Web 2.0技术
7. Review of „Brand-urile în era Web 2.0. Conținutul generat de consumatori” Web 2.0 Brands. User-generated Content by Rodica Săvulescu, București: Tritonic, 2016, 252 p. [O] . Alexandra Vițelar, Florența Toader 2017

机译：回顾“Web 2.0中的品牌。消费者生成的内容“Web 2.0品牌。用户生成的内容通过Rodica Savulescu，布加勒斯特：Tritonic，2016,252 p。

Leveraging Web 2.0 Sources for Web Content Classification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅