首页> 外文会议>International Conference on Data Warehousing and Knowledge Discovery(DaWaK 2006) >Document Representations for Classification of Short Web-Page Descriptions
【24h】

Document Representations for Classification of Short Web-Page Descriptions

机译:文档表示短网页描述的分类

获取原文

摘要

Motivated by applying Text Categorization to sorting Web search results, this paper describes an extensive experimental study of the impact of bag-of-words document representations on the performance of five major classifiers -Naive Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts represent short Web-page descriptions from the dmoz Open Directory Web-page ontology. Different transformations of input data: stemming, normalization, logtf and idf, together with dimensionality reduction, are found to have a statistically significant improving or degrading effect on classification performance measured by classical metrics - accuracy, precision, recall, F_1 and F_2. The emphasis of the study is not on determining the best document representation which corresponds to each classifier, but rather on describing the effects of every individual transformation on classification, together with their mutual relationships.
机译:本文通过将文本分类应用于分拣Web搜索结果,介绍了文字袋文档表示对五个主要分类器的性能的影响的广泛实验研究,贝斯贝雷斯,SVM,投票的感知者,KNN和C4.5的性能。文本代表了DMOZ打开目录网页本体的短网页描述。输入数据的不同转换:源,归一化,LogTF和IDF以及维数减少,在通过经典度量标准,精度,召回,F_1和F_2测量的分类性能具有统计显着的改善或有利影响。该研究的重点不是确定对应于每个分类器的最佳文档表示,而是描述每个单独转换对分类的影响,以及它们的相互关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号