Experiment on Methods for Clustering and Categorization of Polish Text

Wielgosz Maciej; Fraczek Rafa?; Russek Pawe?; Pietroń Marcin; Dabrowska-Boruch Agnieszka; Jamro Ernest; Wiatr Kazimierz

首页> 外文期刊>Computing and informatics >Experiment on Methods for Clustering and Categorization of Polish Text

【24h】

Experiment on Methods for Clustering and Categorization of Polish Text

机译：波兰语文本聚类和分类方法的实验

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The main goal of this work was to experimentally verify the methods for a challenging task of categorization and clustering Polish text. Supervised and unsupervised learning was employed respectively for the categorization and clustering. A profound examination of the employed methods was done for the custom-built corpus of Polish texts. The corpus was assembled by the authors from Internet resources. The corpus data was acquired from the news portal and, therefore, it was sorted by type by journalists according to their specialization. The presented algorithms employ Vector Space Model (VSM) and TF-IDF (Term Frequency-Inverse Document Frequency) weighing scheme. Series of experiments were conducted that revealed certain properties of algorithms and their accuracy. The accuracy of algorithms was elaborated regarding their ability to match human arrangement of the documents by the topic. For both the categorization and clustering, the authors used F-measure to assess the quality of allocation.

机译：这项工作的主要目的是通过实验验证用于对波兰文字进行分类和聚类的艰巨任务的方法。监督和非监督学习分别用于分类和聚类。对定制方法的波兰语语料库进行了深入研究，探讨了所采用的方法。该语料库是由作者从Internet资源中收集的。语料库数据是从新闻门户网站获取的，因此，记者根据其专业性按类型对它们进行了排序。提出的算法采用向量空间模型（VSM）和TF-IDF（术语频率-反文档频率）加权方案。进行了一系列实验，揭示了算法的某些属性及其准确性。详细说明了算法根据主题匹配文档的人为排列的能力。对于分类和聚类，作者使用F度量来评估分配质量。

著录项

来源
《Computing and informatics》 |2017年第1期|共19页
作者
Wielgosz Maciej; Fraczek Rafa?; Russek Pawe?; Pietroń Marcin; Dabrowska-Boruch Agnieszka; Jamro Ernest; Wiatr Kazimierz;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. EXPERIMENT ON METHODS FOR CLUSTERING AND CATEGORIZATION OF POLISH TEXT [J] . Wielgosz Maciej, Fraczek Rafal, Russek Pawel, Computing and informatics . 2017,第1期

机译：波兰语文本的聚类和分类方法的实验
2. Clustering-based Method for Positive and Unlabeled Text Categorization Enhanced by Improved TFIDF [J] . Lu Liu, Tao Peng Journal of information science and engineering . 2014,第5期

机译：改进的TFIDF增强了基于聚类的正面和无标签文本分类方法
3. Categorization of Persons Based on Their Mentions in Polish News Texts [J] . Maciej Pachocki, Anna Wróblewska Journal of Automation, Mobile Robotics & Intelligent Systems . 2020,第2期

机译：基于他们在波兰新闻文本中的提到的人分类
4. Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization [C] . Hamood Alshalabi, Sabrina Tiun, Nazlia Omar International Conference on Electrical Engineering and Informatics . 2014

机译：在自动马来文本分类中使用特征选择和机器学习方法的实验
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Categorization of free-text problem lists: an effective method of capturing clinical data. [O] . J. Zelingher, D. M. Rind, E. Caraballo, 1995

机译：自由文本问题列表的分类：一种捕获临床数据的有效方法。
7. Experiments on the Use of Feature Selection and Machine Learning Methods in Automatic Malay Text Categorization [O] . Alshalabi Hamood, Tiun Sabrina, Omar Nazlia, 2013

机译：马来语文本自动分类中使用特征选择和机器学习方法的实验

Experiment on Methods for Clustering and Categorization of Polish Text

摘要

著录项

相似文献

相关主题

期刊订阅