首页> 外文会议>Intelligent Information Processing and Web Mining; Advances in Soft Computing >Automated Classification of Web Documents into a Hierarchy of Categories

【24h】

Automated Classification of Web Documents into a Hierarchy of Categories

机译：将Web文档自动分类为类别层次结构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, the problem of classifying a HTML documents into a hierarchy of categories is investigated in the context of cooperative information repository, named WebClassII. The hierarchy of categories is involved in all aspects of automated document classification, namely feature extraction, learning, and classification of a new document. Innovative aspects of this work are: a) an experimental study on actual Web documents which can be associated to any node in the hierarchy; b) the feature selection process; c) the automated selection of thresholds for the score returned by a classifier; d) the comparison of three different techniques (flat, hierarchical with proper training sets, hierarchical with hierarchical training sets); e) the definition of new measures for the evaluation of system performances. Results show that the use of hierarchical training sets improves the hierarchical techniques.

机译：在本文中，在名为WebClassII的协作信息存储库的上下文中研究了将HTML文档分类为类别层次结构的问题。类别的层次结构涉及自动文档分类的所有方面，即特征提取，学习和新文档的分类。这项工作的创新之处是：a）对可以与层次结构中的任何节点相关联的实际Web文档的实验研究; b）特征选择过程; c）自动选择分类器返回的分数的阈值; d）三种不同技术的比较（扁平化，具有适当训练集的分层，具有分层训练集的分层）; e）定义用于评估系统性能的新措施。结果表明，分层训练集的使用改进了分层技术。

著录项

来源
《Intelligent Information Processing and Web Mining; Advances in Soft Computing 》|2003年|P.59-68|共10页
会议地点
作者
Michelangelo Ceci; Floriana Esposito; Michele Lapi; Donato Malerba;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络 ;
关键词
web content mining; hierarchical document classification;

机译：Web内容挖掘;分层文档分类;

相似文献

外文文献
中文文献
专利

1. Automated Subject Classification of Textual Documents in the Context of Web-Based Hierarchical Browsing [J] . Koraljka Golub Knowledge Organization . 2011 ,第3期

机译：基于Web的分层浏览环境中文本文档的自动主题分类
2. Classifying web documents in a hierarchy of categories: a comprehensive study [J] . Michelangelo Ceci, Donato Malerba Journal of Intelligent Information Systems . 2007 ,第1期

机译：将Web文档按类别层次结构进行分类：全面研究
3. Automating hierarchical document classification for construction management information systems [J] . Carlos H. Caldas, Lucio Soibelman Automation in construction . 2003 ,第4期

机译：自动化施工管理信息系统的分层文档分类
4. Automated Classification of Web Documents into a Hierarchy of Categories [C] . Michelangelo Ceci, Floriana Esposito, Michele Lapi, International Conference on Intelligent Information Processing and Web Mining IIS: IIPWM'03 . 2003

机译：将Web文档的自动分类为类别的层次结构
5. Hierarchical Classification with Rare Categories and Inconsistencies [D] . Naik, Azad. 2017

机译：具有稀有类别和不一致的层次分类
6. Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations [O] . Aris Fergadis, Christos Baziotis, Dimitris Pappas, 2018

机译：基于分层双向注意的RNN支持受基因突变影响的蛋白质间相互作用的文档分类
7. Automated subject classification of textual documents in the context of Web-based hierarchical browsing [O] . Golub, Koraljka 2011

机译：在基于Web的分层浏览环境中自动对文本文档进行主题分类

Automated Classification of Web Documents into a Hierarchy of Categories

摘要

著录项

相似文献

相关主题

期刊订阅