Clustering Documents in a Web Directory

机译：在Web目录中群集文档

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents. The worst problem of hierarchical supervised classifiers is their high demand in terms of labeled examples, whose amount is related to the number of topics in the taxonomy. Hence, bootstrapping a huge hierarchy with a proper set of labeled examples is a critical issue. In this paper, we propose some solutions for the bootstrapping problem, implicitly or explicitly using a taxonomy definition: a baseline approach where documents are classified according to class labels, and two clustering approaches, where training is constrained by the a-priori knowledge of the taxonomy structure, both at terminological and topo-logical level. In particular, we propose the TaxSOM model, that clusters a set of documents in a predefined hierarchy of classes, directly exploiting the knowledge of both their topological organization and their lexical description. Experimental evaluation was performed on a set of taxonomies taken from the Google Web directory.

机译：由于文本文档的主题层次结构的广泛传播，文档的层次分类是一项越来越引起人们关注的任务。分级监督分类器最严重的问题是对标记示例的高要求，其数量与分类法中主题的数量有关。因此，用一组适当的标记示例来引导巨大的层次结构是一个关键问题。在本文中，我们使用分类法定义隐式或显式地提出了自举问题的一些解决方案：一种基线方法，其中，根据类标签对文档进行分类；以及两种聚类方法，其中，训练受制于对先验知识的了解。术语和拓扑学上的分类结构。特别是，我们提出了TaxSOM模型，该模型将一组文档聚集在预定义的类层次结构中，直接利用其拓扑组织和词法描述的知识。对从Google Web目录获取的一组分类法进行了实验评估。

著录项

来源
《ACM(Association for Computing Machinery) International Workshop on Web Information and Data Management(WIDM 2003); 20031107-20031108; New Orleans,LA; US》|2003年|P.66-73|共8页
会议地点 New Orleans LA(US);New Orleans LA(US);New Orleans LA(US);New Orleans LA(US)
作者
Giordano Adami; Paolo Avesani; Diego Sona;
展开▼
作者单位

ITC-irst via Sommarive 18 38050 Povo, Italy;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
web directories; TaxSOM; constrained clustering; k-means; taxonomy bootstrapping process; text categorization; knowledge management; digital libraries;

机译：网站目录; TaxSOM;约束聚类; k-均值;分类法自举过程;文本分类;知识管理;数字图书馆;

相似文献

外文文献
中文文献
专利

1. Clustering documents into a web directory for bootstrapping a supervised classification [J] . Giordano Adami, Paolo Avesani, Diego Sona Data & Knowledge Engineering . 2005,第3期

机译：将文档聚集到Web目录中以引导受监管的分类
2. Developing a specialized directory system by automatically classifying Web documents [J] . Young Mee Chung, Young-Hee Noh Journal of Information Science . 2003,第2期

机译：通过自动分类Web文档来开发专门的目录系统
3. WEB DOCUMENT CLUSTERING THROUGH METAFILE GENERATION FOR DIGRAPH STRUCTURE USING DOCUMENT INDEX GRAPH [J] . BUDI, SRI NURDIATI, BIB PARUHUM SILALAHI Journal of Theoretical and Applied Information Technology . 2014,第1期

机译：通过文档索引图通过元数据生成的Web文档聚类图结构
4. Improving Text Document Clustering by Exploiting Open Web Directory [C] . Gaurav Ruhela, P. Krishna Reddy International Conference on Software Engineering and Knowledge Engineering . 2012

机译：通过利用打开Web目录改进文本文档群集
5. Clustering Web documents: A phrase-based method for grouping search engine results. [D] . Zamir, Oren Eli. 1999

机译：Web文档群集：一种基于短语的方法，用于对搜索引擎结果进行分组。
6. Desktop document delivery using portable document format (PDF) files and the Web. [O] . J P Shipman, W L Gembala, J M Reeder, 1998

机译：使用可移植文档格式（PDF）文件和Web进行桌面文档传递。
7. Clustering documents in a web directory [O] . Giordano Adami, Paolo Avesani, Diego Sona 2011

机译：将文档集中在Web目录中
8. Web Document Clustering Using Hyperlink Structures [R] . He, X., Zha, H., Ding, C. H. Q., 2003

机译：使用超链接结构的Web文档聚类

Clustering Documents in a Web Directory

摘要

著录项

相似文献

相关主题

期刊订阅