首页> 美国卫生研究院文献>Sensors (Basel Switzerland) >A Method of Short Text Representation Based on the Feature Probability Embedded Vector
【2h】

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

机译:基于特征概率嵌入向量的短文本表示方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method.
机译:文本表示是自然语言处理(NLP)领域的关键任务之一。传统的特征提取和加权方法经常使用词袋(BoW)模型,这可能导致语义信息的缺乏以及高维数和稀疏度的问题。当前,为了解决这些问题,流行的想法是利用深度学习方法。本文结合特征加权,词嵌入和主题模型,提出了一种无监督的文本表示方法,称为特征,概率和词嵌入方法。主要思想是使用单词嵌入技术Word2Vec获得单词向量,然后将其与特征加权TF-IDF和主题模型LDA组合。与传统特征工程相比,该方法不仅提高了向量空间模型的表达能力,而且减小了文档向量的维数。除此之外,它还可用于解决BoW的信息不足,尺寸大和稀疏度高的问题。我们将提出的方法用于文本分类任务,并验证了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号