首页> 外文会议>IEEE International Conference on Systems, Man, and Cybernetics >Multilingual and Hierarchical Classification of Large Datasets of Scientific Publications
【24h】

Multilingual and Hierarchical Classification of Large Datasets of Scientific Publications

机译:科学出版物大数据集的多语言和分级分类

获取原文

摘要

The aim of this paper was to propose a classification system composed of monolingual classifiers and a multilingual decision module, for handling large numbers of multilingual documents. The system was compared with two monolingual classifiers, respectively for English and Polish, and with the maximum probability model. The tests were carried out over multilingual documents that contained components of two languages, English and Polish. The conclusion was that the proposed system is capable to cope with the efficient categorization of a large number of documents related to assorted topics, and simultaneously contained components from many languages. Additional objectives were to examine of two ways of data representation, as well as hierarchical and horizontal approaches to classification, assuming that a structure of classes is hierarchical. The results showed that the document representation as separate features is better than a bag of words, and the flat approach is only slightly better than the hierarchical approach.
机译:本文的目的是提出一种由单语分类器和多语决策模块组成的分类系统,用于处理大量的多语文档。将该系统与分别用于英语和波兰语的两个单语分类器以及最大概率模型进行了比较。测试是在包含英语和波兰语两种语言的多语言文档上进行的。结论是,提出的系统能够应对与分类主题相关的大量文档的有效分类,并且同时包含来自多种语言的组件。假设目标类的结构是分层的,则另外的目标是检查数据表示的两种方式,以及分层的分层方法。结果表明,将文档表示为单独的特征比使用一袋单词要好,而扁平化方法只比分层方法好一点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号