首页> 外文会议>Advances in information retrieval. >Classification of Short Texts by Deploying Topical Annotations
【24h】

Classification of Short Texts by Deploying Topical Annotations

机译:通过部署主题注释对短文本进行分类

获取原文
获取原文并翻译 | 示例

摘要

We propose a novel approach to the classification of short texts based on two factors: the use of Wikipedia-based annotators that have been recently introduced to detect the main topics present in an input text, represented via Wikipedia pages, and the design of a novel classification algorithm that measures the similarity between the input text and each output category by deploying only their annotated topics and the Wikipedia link-structure. Our approach waives the common practice of expanding the feature-space with new dimensions derived either from explicit or from latent semantic analysis. As a consequence it is simple and maintains a compact intelligible representation of the output categories. Our experiments show that it is efficient in construction and query time, accurate as state-of-the-art classifiers (see e.g. Phan et al. WWW '08), and robust with respect to concept drifts and input sources.
机译:我们基于两个因素,提出了一种新颖的短文本分类方法:使用最近引入的基于Wikipedia的注释器检测通过Wikipedia页面表示的输入文本中存在的主要主题,以及新颖的设计通过仅部署带注释的主题和Wikipedia链接结构来测量输入文本和每个输出类别之间的相似性的分类算法。我们的方法放弃了用显式或潜在语义分析派生的新维度扩展特征空间的常规做法。结果,它很简单,并且可以清晰地表示输出类别。我们的实验表明,它在构造和查询时间方面是高效的,可以作为最新的分类器(请参见例如Phan等人,WWW '08)进行精确分类,并且在概念漂移和输入源方面也很强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号