Unsupervised Topic Labeling of Text Based on Wikipedia Categorization

Tetyana Loskutova

首页> 外文期刊>Journal of Systemics, Cybernetics and Informatics >Unsupervised Topic Labeling of Text Based on Wikipedia Categorization

【24h】

Unsupervised Topic Labeling of Text Based on Wikipedia Categorization

机译：根据维基百科分类的无监督主题标签文本

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Defining text topicality is often an expensive problem thatrequires significant resources for text labeling. Though manypackages already exist that provide dictionaries of labeled text,synonyms, and Part-of-Speach tagging, the problem is ongoingas language develops and new meanings of words and phrasesemerge. This paper proposes a cheap in human labor solution totopic labeling of any text in the majority of languages. Themethodology uses links to the naturally emerging corpus oflabeled text – the Wikipedia. Wikipedia categories areprocessed to extract a weighted set of topic labels for theanalyzed text. The approach is evaluated by processingcategorized texts and comparing the similarity of the top ranksof topic labels to the text category. The topic labels extractedusing this methodology can be used for comparing similarity oftexts, for the assessment of the completeness of topic coveragein automated marking of essays, and for coding in qualitativetext analysis. The paper contributes to the field of NLP byoffering a cheap and organically developing method of topicaltext labeling. The paper contributes to the work of qualitativeanalysts by offering a methodology for the analysis of interviewtranscripts and other unstructured text.

机译：定义文本局部性通常是一个昂贵的问题，即大量资源进行文本标签。虽然已经存在了ManeMAckages，但提供标记文本的词典，同义词和跨越式标记，问题是ondorasas语言的开发和新含义的单词和短语的含义。本文提出了廉价的人工劳动解决方案，对大多数语言中的任何文本都有廉价标签。 Themethodology使用与标签文本的自然新兴语料库的链接 - 维基百科。 Wikipedia类别是对Theanalyzed文本提取的加权主题标签。该方法是通过处理传言文本来评估的，并将顶部RupeSof主题标签的相似性进行评估到文本类别。提取该方法的主题标签可以用于比较IFTexts的相似性，以评估主题覆盖物的完整性散文的完整性，以及在质量上的分析中编码。本文有助于NLP offfering一种廉价和有机开发的主题文本标记方法。本文通过为采访者和其他非结构化文本提供分析方法，有助于质量的工作。

著录项

来源
《Journal of Systemics, Cybernetics and Informatics》 |2019年第4期|共5页
作者
Tetyana Loskutova;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Boosting algorithms with topic modeling for multi-label text categorization: A comparative empirical study [J] . Bassam Al-Salemi, Mohd. Juzaiddin Ab Aziz, Shahrul Azman Noah Journal of Information Science . 2015,第5期

机译：基于主题建模的多标签文本分类增强算法：一项比较经验研究
2. Arabic text categorization based on Arabic Wikipedia [J] . Lalit Saxena Computing reviews . 2014,第9期

机译：基于阿拉伯语维基百科的阿拉伯语文本分类
3. Arabic Text Categorization Based on Arabic Wikipedia [J] . ADNAN YAHYA, ALI SALHI ACM transactions on Asian language information processing . 2014,第1期

机译：基于阿拉伯语维基百科的阿拉伯语文本分类
4. Unsupervised Topic Labeling of Text based on Wikipedia Categorization [C] . Tetyana LOSKUTOVA World Multi-Conference on Systemics, Cybernetics and Informatics . 2019

机译：根据维基百科分类的无监督主题标签文本
5. Edits Based Categorization of Crowd Sourced Document Corpora with Application to Wikipedia [D] . Fang, Yue 2018

机译：基于人群的文档库的基于编辑的分类及其在维基百科中的应用
6. Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling [O] . Aytuğ Onan 2018

机译：基于集合修剪和优化主题建模的生物医学文本分类
7. Arabic text categorization based on Arabic Wikipedia [O] . Yahya Adnan, Salhi Ali 2014

机译：基于阿拉伯语维基百科的阿拉伯语文本分类

Unsupervised Topic Labeling of Text Based on Wikipedia Categorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅