Classification of Short Texts by Deploying Topical Annotations

机译：通过部署主题注释对短文本进行分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW '08), and robust with respect to concept drifts and input sources.

机译：我们基于两个因素，提出了一种新颖的短文本分类方法：使用最近引入的基于Wikipedia的注释器检测通过Wikipedia页面表示的输入文本中存在的主要主题，以及新颖的设计通过仅部署带注释的主题和Wikipedia链接结构来测量输入文本和每个输出类别之间的相似性的分类算法。我们的方法放弃了用显式或潜在语义分析派生的新维度扩展特征空间的常规做法。结果，它很简单，并且可以清晰地表示输出类别。我们的实验表明，它在构造和查询时间方面是高效的，可以作为最新的分类器（请参见例如Phan等人，WWW '08）进行精确分类，并且在概念漂移和输入源方面也很强大。

著录项

来源
《Advances in information retrieval.》|2012年|p.376-387|共12页
会议地点 Barcelona(ES);Barcelona(ES)
作者
Daniele Vitale; Paolo Ferragina; Ugo Scaiella;
展开▼
作者单位

Dipartimento di Informatica University of Pisa, Italy;

Dipartimento di Informatica University of Pisa, Italy;

Dipartimento di Informatica University of Pisa, Italy;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
2. The articles.ELM resource: simplifying access to protein linear motif literature by annotation, text-mining and classification [J] . N Palopoli, J A Iserte, L B Chemes, Database . 2020,第1期

机译：艺术品资源：通过注释，文本挖掘和分类简化对蛋白线性主题文献的访问
3. Classification of Scientific Texts Based on the Compression of Annotations to Publications [J] . Selivanova I. V, Kosyakov D. V, Guskov A. E. Automatic Documentation and Mathematical Linguistics . 2019,第6期

机译：基于对出版物的注释压缩的科学文本分类
4. Feature extension for Chinese short text classification based on topical N-Grams [C] . Baoshan Sun, Peng Zhao IEEE/ACIS International Conference on Computer and Information Science . 2017

机译：基于主题N-gram的中文短文本分类特征扩展
5. Improving Sentiment Classification for Arabic Short Text Using Deep Learning Approaches [D] . Alwehaibi, Ali. 2021

机译：利用深度学习方法改善阿拉伯语短文本的情感分类
6. The articles.ELM resource: simplifying access to protein linear motif literature by annotation text-mining and classification [O] . N Palopoli, J A Iserte, L B Chemes, 2020

机译：ELM资源：通过注释文本挖掘和分类简化对蛋白质线性基序文献的访问
7. Transferring Topical Knowledge from Auxiliary Long Texts for Short Text Clustering [O] . Ou Jin, Yong Yu, Nathan N. Liu, 2012

机译：从辅助长文本转移专题知识进行短文本聚类

Classification of Short Texts by Deploying Topical Annotations

摘要

著录项

相似文献

相关主题

期刊订阅