...
首页> 外文期刊>Computing and informatics >EXPERIMENT ON METHODS FOR CLUSTERING AND CATEGORIZATION OF POLISH TEXT
【24h】

EXPERIMENT ON METHODS FOR CLUSTERING AND CATEGORIZATION OF POLISH TEXT

机译:波兰语文本的聚类和分类方法的实验

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation.
机译:这项工作的主要目的是通过实验验证用于对波兰文本进行分类和聚类的艰巨任务的方法。分别采用监督学习和无监督学习进行分类和聚类。对于波兰语文本的定制语料库,对采用的方法进行了深入研究。该语料库是由作者从Internet资源中收集的。语料库数据是从新闻门户获取的,因此,新闻工作者根据其专业性按类型进行了分类。提出的算法采用向量空间模型(VSM)和TF-IDF(术语频率-反文档频率)加权方案。进行了一系列实验,揭示了算法的某些属性及其准确性。详细说明了算法根据主题匹配文档的人为安排的能力。对于分类和聚类,作者使用F度量来评估分配质量。

著录项

  • 来源
    《Computing and informatics》 |2017年第1期|186-204|共19页
  • 作者单位

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

    AGH Univ Sci & Technol, Acad Comp Ctr Cyfronet AGH, Nawojki 11, PL-30950 Krakow, Poland;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Polish text; categorization; clustering; VSM; TF-IDF;

    机译:波兰语文字;分类;聚类;VSM;TF-IDF;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号