首页> 外文期刊>Knowledge-Based Systems >Improving short text classification by learning vector representations of both words and hidden topics
【24h】

Improving short text classification by learning vector representations of both words and hidden topics

机译:通过学习单词和隐藏主题的向量表示来改善短文本分类

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a general framework for short text classification by learning vector representations of both words and hidden topics together. We refer to a large-scale external data collection named "corpus" which is topic consistent with short texts to be classified and then use the corpus to build topic model with Latent Dirichlet Allocation (LDA). For all the texts of the corpus and short texts, topics of words are viewed as new words and integrated into texts for data enriching. On the enriched corpus, we can learn vector representations of both words and topics. In this way, feature representations of short texts can be performed based on vectors of both words and topics for training and classification. On an open short text classification data set, learning vectors of both words and topics can significantly help reduce the classification error comparing with learning only word vectors. We also compared the proposed classification method with various baselines and experimental results justified the effectiveness of our word/topic vector representations. (C) 2016 Elsevier B.V. All rights reserved.
机译:本文通过学习单词和隐藏主题的向量表示形式,提出了用于短文本分类的通用框架。我们引用了一个名为“语料库”的大规模外部数据收集,该主题是与要分类的短文本一致的主题,然后使用语料库通过潜在狄利克雷分配(LDA)建立主题模型。对于语料库的所有文本和短文本,单词的主题都被视为新单词,并集成到文本中以进行数据丰富。在丰富的语料库上,我们可以学习单词和主题的矢量表示。以此方式,可以基于单词和主题两者的向量来执行短文本的特征表示,以进行训练和分类。在开放的短文本分类数据集上,与仅学习单词向量相比,单词和主题的学习向量都可以大大帮助减少分类错误。我们还将提议的分类方法与各种基准进行了比较,实验结果证明了我们的词/主题向量表示的有效性。 (C)2016 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2016年第15期|76-86|共11页
  • 作者

    Zhang Heng; Zhong Guoqiang;

  • 作者单位

    Chinese Acad Sci, Inst Automat, Interact Digital Media Technol Res Ctr, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China;

    Ocean Univ China, Dept Comp Sci & Technol, 238 Songling Rd, Qingdao 266100, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Short texts; Topic model; Data enrich; Word and topic vectors;

    机译:短文本;主题模型;数据丰富;词和主题向量;
  • 入库时间 2022-08-18 02:49:57

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号