首页> 外文会议>IEEE/WIC/ACM International Conference on Web Intelligence >An unsupervised hierarchical approach to document categorization
【24h】

An unsupervised hierarchical approach to document categorization

机译:一种无监督的文档分类方法

获取原文

摘要

We propose a hierarchical approach to document categorization that requires no pre-configuration and maps the semantic document space to a predefined taxonomy. The utilization of search engines to train a hierarchical classifier makes our approach more flexible than existing solutions which rely on (human) labeled data and are bound to a specific domain. We show that the structural information given by the taxonomy allows for a context aware construction of search queries and leads to higher tagging accuracy. We test our approach on different benchmark datasets and evaluate its performance on the single- and multi-tag assignment tasks. The experimental results show that our solution is as accurate as supervised classifiers for web page classification and still performs well when categorizing domain specific documents.
机译:我们提出了一种不需要预先配置的分层方法来进行文档分类,并将语义文档空间映射到预定义的分类法。利用搜索引擎来训练分层分类器,使我们的方法比依赖于(人类)标记数据并绑定到特定域的现有解决方案更加灵活。我们显示,分类法提供的结构信息允许上下文感知的搜索查询构造,并导致更高的标记准确性。我们在不同的基准数据集上测试我们的方法,并评估其在单标签和多标签分配任务中的性能。实验结果表明,我们的解决方案与用于网页分类的监督分类器一样准确,并且在对特定领域的文档进行分类时仍然表现良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号