Concept decompositions for short text clustering by identifying word communities

Jia Caiyan; Carson Matthew B.; Wang Xiaoyang; Yu Jian

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Concept decompositions for short text clustering by identifying word communities

【24h】

Concept decompositions for short text clustering by identifying word communities

机译：通过识别Word Communities的简短文本聚类的概念分解

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Short text clustering is an increasingly important methodology but faces the challenges of sparsity and high-dimensionality of text data. Previous concept decomposition methods have obtained concept vectors via the centroids of clusters using k-means-type clustering algorithms on normal, full texts. In this study, we propose a new concept decomposition method that creates concept vectors by identifying semantic word communities from a weighted word co-occurrence network extracted from a short text corpus or a subset thereof. The cluster memberships of short texts are then estimated by mapping the original short texts to the learned semantic concept vectors. The proposed method is not only robust to the sparsity of short text corpora but also overcomes the curse of dimensionality, scaling to a large number of short text inputs due to the concept vectors being obtained from term-term instead of document-term space. Experimental tests have shown that the proposed method outperforms state-of-the-art algorithms. (C) 2017 Elsevier Ltd. All rights reserved.

机译：短文本聚类是一种越来越重要的方法，但面临稀疏性和文本数据的高度的挑战。以前的概念分解方法通过在正常的全文上使用K-Meancy型聚类算法通过集群的质心获得了概念向量。在这项研究中，我们提出了一种新的概念分解方法，其通过从短文本语料库或其子集中提取的加权词共生网络识别语义词群创建概念传统方法。然后通过将原始简短文本映射到学习的语义概念向量来估计短文本的集群成员资格。该方法不仅坚固了对短文本语料库的稀疏性，而且还克服了维度的诅咒，由于从术语期间而不是文档术语空间获得的概念向量，缩放到大量短文本输入。实验测试表明，所提出的方法优于最先进的算法。（c）2017 Elsevier Ltd.保留所有权利。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2018年第2018期|共13页
作者
Jia Caiyan; Carson Matthew B.; Wang Xiaoyang; Yu Jian;
展开▼
作者单位

Beijing Jiaotong Univ Sch Comp &

Informat Technol Beijing 100044 Peoples R China;

Northwestern Univ Feinberg Sch Med Dept Prevent Med Div Hlth &

Biomed Informat Chicago IL 60611 USA;

Beijing Jiaotong Univ Sch Comp &

Informat Technol Beijing 100044 Peoples R China;

Beijing Jiaotong Univ Sch Comp &

Informat Technol Beijing 100044 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Short text clustering; Concept decomposition; Spherical k-means; Semantic word community; Community detection;

机译：短文本聚类;概念分解;球面k均值;语义词群;社区检测;

相似文献

外文文献
中文文献
专利

1. Concept decompositions for short text clustering by identifying word communities [J] . Jia Caiyan, Carson Matthew B., Wang Xiaoyang, Pattern Recognition: The Journal of the Pattern Recognition Society . 2018,第期

机译：通过识别Word Communities的简短文本聚类的概念分解
2. Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification [J] . Wang Peng, Xu Bo, Xu Jiaming, Neurocomputing . 2016,第JANa22PTaB期

机译：使用词嵌入聚类和卷积神经网络进行语义扩展以改善短文本分类
3. Improving statistical keyword detection in short texts: Entropic and clustering approaches [J] . Carretero-Campos C., Bernaola-Galván P., Coronado A.V., Physica, A. Statistical mechanics and its applications . 2013,第6期

机译：改进短文本中的统计关键字检测：熵和聚类方法
4. Identifying Domains and Concepts in Short Texts via Partial Taxonomy and Unlabeled Data [C] . Yihong Zhang, Claudia Szabo, Quan Z. Sheng, International conference on advanced information systems engineering . 2017

机译：通过部分分类法和未标记的数据识别短文本中的域和概念
5. Merging above- and belowground processes: Identifying how decomposer community composition shapes litter decomposition dynamics [D] . Keiser, Ashley Dawn. 2014

机译：合并地下和地下过程：识别分解者群落组成如何影响垃圾分解动力学
6. Long-Range Memory in Literary Texts: On the Universal Clustering of the Rare Words [O] . Kumiko Tanaka-Ishii, Armin Bunde -1

机译：文学文本中的长期记忆：稀有词的普遍聚类
7. Figure 3: (A–D) TFMN frames around “woman” (A), “man” (B) and “person” (D). Words are clustered in communities obtained via the Louvain algorithm (cf. Konstantinidis, Papadopoulos Kompatsiaris, 2017). Words in the same community of “woman” in A are plotted through a hierarchical edge-bundling visualisation in C. Positive (negative) words and links are highlighted in cyan (red). Links between positive and negative concepts are reported in purple. Semantic links between synonyms are in green. (E–F) Emotional profiles for the frames of “woman” (E) and “man” (F) indicate the fraction of concepts in a frame eliciting a certain emotion. [O] . -1

机译：图3：（a-d）“女人”（a），“man”（b）和“人”（d）周围的TFMN框架。用Louvain算法（CF.Konstantinidis，Papadopoulos＆Kompatsiaris，2017年）的群体聚集在一起。在A中的“女人”中的单词通过C.正面（负）单词和链接在Cyan（红色）中突出显示了“女人”中的“女人”的单词。紫色报道了正面和消极概念之间的链接。同义词之间的语义链接是绿色的。（E-F）“女”（e）和“人”（f）框架的情感概况表明框架中诱因某种情绪的概念的比例。

Concept decompositions for short text clustering by identifying word communities

摘要

著录项

相似文献

相关主题

期刊订阅