首页> 外文期刊>World Wide Web >A Pseudo-document-based Topical N-grams model for short texts
【24h】

A Pseudo-document-based Topical N-grams model for short texts

机译:基于伪文档的外用N-GRAMS模型,短文本

获取原文
获取原文并翻译 | 示例
           

摘要

In recent years, short text topic modeling has drawn considerable attentions from interdisciplinary researchers. Various customized topic models have been proposed to tackle the semantic sparseness nature of short texts. Most (if not all) of them follow the bag-of-words assumption, which, however, is not adequate since word order and phrases are often critical to capturing the meaning of texts. On the other hand, while some existing topic models are sensitive to word order, they do not perform well on short texts due to the severe data sparseness. To address these issues, we propose the Pseudo-document-based Topical N-Grams model (PTNG), which alleviates the data sparsity problem of short texts while is sensitive to word order. Extensive experiments on three real-world data sets with state-of-the-art baselines demonstrate the high quality of topics learned by PTNG according to UCI coherence scores and more discriminative semantic representation of short texts according to classification results.
机译:近年来,短文本主题建模已经引起了跨学科研究人员的大大关注。已经提出了各种定制主题模型来解决短信的语义稀疏性质。其中大多数(如果不是全部)遵循单词袋的假设,然而,由于Word Order和Phrase往往是捕获文本含义至关重要的。另一方面,虽然某些现有的主题模型对Word顺序敏感,但由于严重的数据稀疏性,它们在短文本上没有良好表现良好。为了解决这些问题,我们提出了基于伪文档的局部N-GRAMS模型(PTNG),其减轻了短文本的数据稀疏问题,同时对单词顺序敏感。根据最先进的基线的三个现实世界数据集的大量实验展示了PTNG根据UCI相干评分和根据分类结果的短文本的更辨别性语义表示的高质量主题。

著录项

  • 来源
    《World Wide Web》 |2020年第6期|3001-3023|共23页
  • 作者单位

    School of Economics and Management Beihang University Beijing 100191 China;

    School of Economics and Management Beihang University Beijing 100191 China;

    School of Economics and Management Beihang University Beijing 100191 China;

    School of Economics and Management Beihang University Beijing 100191 China;

    School of Economics and Management Beihang University Beijing 100191 China Beijing Advanced Innovation Center for Big Data and Brain Computing Beihang University Beijing 100191 China Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations Beihang University Beijing 100191 China;

    Jiangsu Provincial Key Laboratory of E-Business Nanjing University of Finance and Economics Nanjing China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Short text; Topic model; Word order; Topical N-Grams;

    机译:短文本;主题模型;词序;局部n-grams;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号