Generating Category Hierarchy for Classifying Large Corpora

Fumiyo FUKUMOTO; Yoshimi SUZUKI

首页> 外文期刊>IEICE Transactions on Information and Systems >Generating Category Hierarchy for Classifying Large Corpora

【24h】

Generating Category Hierarchy for Classifying Large Corpora

机译：生成用于分类大型语料库的类别层次结构

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address the problem of dealing with large collections of data, and investigate the use of automatically constructing domain specific category hierarchies to improve text classification. We use two well-known techniques, the partitioning clustering method called it-means and loss function, to create the category hierarchy. The it-means method involves iterating through the data that the system is permitted to classify during each iteration and construction of a hierarchical structure. In general, the number of clusters k is not given beforehand. Therefore, we used a loss function that measures the degree of disappointment in any differences between the true distribution over inputs and the learner's prediction to select the appropriate number of clusters k. Once the optimal number of k is selected, the procedure is repeated for each cluster. Our evaluation using the 1996 Reuters corpus, which consists of 806,791 documents, showed that automatically constructing hierarchies improves classification accuracy.

机译：我们解决了处理大量数据的问题，并研究了使用自动构建领域特定类别层次结构来改善文本分类的问题。我们使用两种众所周知的技术（称为it-means的分区聚类方法和损失函数）来创建类别层次结构。 it-means方法涉及迭代数据，该数据允许系统在每次迭代和层次结构的构造期间进行分类。通常，不预先给出簇数k。因此，我们使用损失函数来衡量失望的程度，这些失望程度是输入的真实分布与学习者的预测之间的任何差异，以选择合适的聚类数k。一旦选择了最佳的k数，将对每个群集重复该过程。我们使用1996年的Reuters语料库（包含806,791个文档）进行的评估显示，自动构建层次结构可提高分类的准确性。

著录项

来源
《IEICE Transactions on Information and Systems》 |2006年第4期|p.1543-1554|共12页
作者
Fumiyo FUKUMOTO; Yoshimi SUZUKI;
展开▼
作者单位

Interdisciplinary Graduate School of Medicine and Engineering, Kofu-shi, 400-8511 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
category hierarchies; k-means; log loss function;

机译：类别层次结构;k均值;对数损失函数;

相似文献

外文文献
中文文献
专利

1. Use of hierarchical cluster analysis to classify prisons in Ireland into mutually exclusive drug-use risk categories [J] . Codd Mary, Mehegan John, Kelleher Cecily, Drugs: education, prevention, and policy . 2016,第2期

机译：使用层次聚类分析将爱尔兰的监狱划分为互斥的毒品使用风险类别
2. Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection [J] . Fan Jianping, Zhang Ji, Mei Kuizhi, Pattern Recognition: The Journal of the Pattern Recognition Society . 2015,第5期

机译：用于大规模图像分类和新颖类别检测的分层树分类器的成本敏感型学习
3. Classifying web documents in a hierarchy of categories: a comprehensive study [J] . Michelangelo Ceci, Donato Malerba Journal of Intelligent Information Systems . 2007,第1期

机译：将Web文档按类别层次结构进行分类：全面研究
4. LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora [C] . Chien-Chung Huang, Shui-Lung Chuang, Lee-Feng Chien International World Wide Web Conference . 2004

机译：LiveClassifier：通过Web Corpora创建分层文本分类器
5. Induced model category structures on categories of internal abelian group objects in cofibrantly generated model categories. [D] . Mathey, Phillipp. 2010

机译：在共纤维生成的模型类别中的内部阿贝尔群对象类别上诱导模型类别结构。
6. A Linear-RBF Multikernel SVM to Classify Big Text Corpora [O] . R. Romero, E. L. Iglesias, L. Borrajo -1

机译：用于对大文本语料库进行分类的线性RBF多核SVM
7. Classifying Web Documents in a Hierarchy of Categories: A Comprehensive Study [O] . CECI M, MALERBA D 2007

机译：在类别层次结构中对Web文档进行分类：综合研究

Generating Category Hierarchy for Classifying Large Corpora

摘要

著录项

相似文献

相关主题

期刊订阅