首页> 外文期刊>Journal of Information Science >Developing a specialized directory system by automatically classifying Web documents
【24h】

Developing a specialized directory system by automatically classifying Web documents

机译:通过自动分类Web文档来开发专门的目录系统

获取原文
获取原文并翻译 | 示例
       

摘要

This study developed a specialized directory system using an automatic classification technique. Economics was selected as the subject field for the classification experiments with Web documents. The classification scheme of the directory follows the DDC, and subject terms representing each class number or subject category were selected from the DDC table to construct a representative term dictionary. In collecting and classifying the Web documents, various strategies were tested in order to find the optimal thresholds. In the classification experiments, Web documents in economics were classified into a total of 757 hierarchical subject categories built from the DDC scheme. The first and second experiments using the representative term dictionary resulted in relatively high precision ratios of 77 and 60%, respectively. The third experiment employing a machine learning-based k-nearest neighbours (kNN) classifier in a closed experimental setting achieved a precision ratio of 96%. This implies that it is possible to enhance the classification performance by applying a hybrid method combining a dictionary-based technique and a kNN classifier.
机译:这项研究开发了使用自动分类技术的专用目录系统。选择经济学作为Web文档分类实验的主题领域。目录的分类方案遵循DDC,并且从DDC表中选择代表每个类编号或主题类别的主题词,以构建代表性的术语字典。在收集和分类Web文档时,测试了各种策略以找到最佳阈值。在分类实验中,将经济学中的Web文档分类为根据DDC方案构建的总共757个等级科目类别。使用代表性术语词典的第一次和第二次实验分别导致相对较高的精度,分别为77%和60%。在封闭的实验环境中使用基于机器学习的k最近邻(kNN)分类器进行的第三个实验实现了96%的准确率。这意味着可以通过应用结合了基于字典的技术和kNN分类器的混合方法来提高分类性能。

著录项

  • 来源
    《Journal of Information Science》 |2003年第2期|p.117-126|共10页
  • 作者

    Young Mee Chung; Young-Hee Noh;

  • 作者单位

    Department of Library and Information Science, Yonsei University, 134 Shinchon-Dong, Seodaemun-Gu, Seoul, Korea;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 情报学、情报工作;
  • 关键词

  • 入库时间 2022-08-17 23:21:22

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号