A Method of Short Text Representation Based on the Feature Probability Embedded Vector

机译：基于特征概率嵌入向量的短文本表示方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text representation is one of the key tasks in the field of natural language processing (NLP). Traditional feature extraction and weighting methods often use the bag-of-words (BoW) model, which may lead to a lack of semantic information as well as the problems of high dimensionality and high sparsity. At present, to solve these problems, a popular idea is to utilize deep learning methods. In this paper, feature weighting, word embedding, and topic models are combined to propose an unsupervised text representation method named the feature, probability, and word embedding method. The main idea is to use the word embedding technology Word2Vec to obtain the word vector, and then combine this with the feature weighted TF-IDF and the topic model LDA. Compared with traditional feature engineering, the proposed method not only increases the expressive ability of the vector space model, but also reduces the dimensions of the document vector. Besides this, it can be used to solve the problems of the insufficient information, high dimensions, and high sparsity of BoW. We use the proposed method for the task of text categorization and verify the validity of the method.

机译：文本表示是自然语言处理（NLP）领域的关键任务之一。传统的特征提取和加权方法经常使用词袋（BoW）模型，这可能导致语义信息的缺乏以及高维数和稀疏度的问题。当前，为了解决这些问题，流行的想法是利用深度学习方法。本文结合特征加权，词嵌入和主题模型，提出了一种无监督的文本表示方法，称为特征，概率和词嵌入方法。主要思想是使用单词嵌入技术Word2Vec获得单词向量，然后将其与特征加权TF-IDF和主题模型LDA组合。与传统特征工程相比，该方法不仅提高了向量空间模型的表达能力，而且减小了文档向量的维数。除此之外，它还可用于解决BoW的信息不足，尺寸大和稀疏度高的问题。我们将提出的方法用于文本分类任务，并验证了该方法的有效性。

著录项

期刊名称 Sensors (Basel Switzerland)
作者
Wanting Zhou; Hanbin Wang; Hongguang Sun; Tieli Sun;
展开▼
作者单位

展开▼
年(卷),期 2019(19),17
年度 2019
页码 3728
总页数 23
原文格式 PDF
正文语种
中图分类
关键词
word embedding latent Dirichlet allocation feature weighting text representation;

机译：词嵌入;潜在Dirichlet分配;特征加权;文本表示;
入库时间 2022-08-17 12:49:51

相似文献

外文文献
中文文献
专利

1. Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification [J] . Li Ming, Liu Lun, Cai Weicheng, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：具有语音分词和串联特性的通用I向量表示，可用于文本无关和文本相关的说话人验证
2. A Novel Feature Selection Method Based on Probability Latent Semantic Analysis for Chinese Text Classification [J] . ZHONG Jiang, SUN Qigan, LI Xue, 电子学报：英文版 . 2011,第002期

机译：基于概率潜在语义分析的中文文本分类新特征选择方法
3. A feature representation method for biomedical scientific data based on composite text description [J] . SUN Wei 中国文献情报（英文刊） . 2009,第004期

机译：基于复合文本描述的生物医学科学数据特征表示方法
4. A Semi-Supervised Short Text Classification Method Based on Weighted Word Vector Representation [C] . Zhiming Zhang, Jie Luo, Geyu Huang 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication . 2019

机译：基于加权词向量表示的半监督短文本分类方法
5. A new feature selection method based on support vector machines for text categorization. [D] . Xu, Yaquan. 2006

机译：一种基于支持向量机的文本分类新特征选择方法。
6. Classification of Biomedical Texts for Cardiovascular Diseases with Deep Neural Network Using a Weighted Feature Representation Method [O] . Nizar Ahmed, Fatih Dilmaç, Adil Alpkocak 2020

机译：使用加权特征表示方法对深神经网络的生物医学文本的分类
7. MeSHHeading2vec: A new method for representing MeSH headings as feature vectors based on graph embedding algorithm [O] . Zhen-Hao Guo, Zhu-Hong You, Hai-Cheng Yi, 2019

机译：MesheSheading2VEC：一种基于图形嵌入算法的特征向量表示网格标题的新方法

A Method of Short Text Representation Based on the Feature Probability Embedded Vector

摘要

著录项

相似文献

相关主题

期刊订阅