首页> 外文会议>Pacific-Asia Conference on Knowledge Discovery and Data Mining >Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

【24h】

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

机译：顺序嵌入诱导文本聚类，一种非参数贝叶斯方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current state-of-the-art nonparametric Bayesian text clustering methods model documents through multinomial distribution on bags of words. Although these methods can effectively utilize the word burstiness representation of documents and achieve decent performance, they do not explore the sequential information of text and relationships among synonyms. In this paper, the documents are modeled as the joint of bags of words, sequential features and word embeddings. We proposed Sequential Embedding induced Dirichlet Process Mixture Model (SiDPMM) to effectively exploit this joint document representation in text clustering. The sequential features are extracted by the encoder-decoder component. Word embeddings produced by the continuous-bag-of-words (CBOW) model are introduced to handle synonyms. Experimental results demonstrate the benefits of our model in two major aspects: (1) improved performance across multiple diverse text datasets in terms of the normalized mutual information (NMI); (2) more accurate inference of ground truth cluster numbers with regularization effect on tiny outlier clusters.

机译：当前最新的非参数贝叶斯文本聚类方法通过在单词袋上的多项式分布来对文档进行建模。尽管这些方法可以有效地利用文档的单词突发性表示并获得不错的性能，但它们并没有探索文本的顺序信息以及同义词之间的关系。在本文中，将文档建模为单词，顺序特征和单词嵌入的组合。我们提出了顺序嵌入诱导的Dirichlet过程混合模型（SiDPMM），以有效利用文本聚类中的联合文档表示。顺序特征由编码器-解码器组件提取。引入了由连续词袋（CBOW）模型产生的词嵌入来处理同义词。实验结果从两个主要方面证明了我们模型的好处：（1）在标准化互信息（NMI）方面，跨多个不同文本数据集提高了性能; （2）更精确地推断地面真相簇数，并对微小的离群簇进行正则化。

著录项

来源
《Pacific-Asia Conference on Knowledge Discovery and Data Mining 》|2019年|68-80|共13页
会议地点
作者
Tiehang Duan; Qi Lou; Sargur N. Srihari; Xiaohui Xie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Bayesian non-parametric clustering approach for semi-supervised Structural Health Monitoring [J] . Rogers T. J., Worden K., Fuentes R., Mechanical systems and signal processing . 2019 ,第MARa15期

机译：半监督结构健康监测的贝叶斯非参数聚类方法
2. Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach [J] . Alexander Lin, Yingzhuo Zhang, Jeremy Heng, JMLR: Workshop and Conference Proceedings . 2018 ,第2010期

机译：具有非线性动力学的聚类时间序列：贝叶斯非参数和基于粒子的方法
3. Heterogeneous Sensor Data Fusion Approach for Real-time Monitoring in Ultraprecision Machining (UPM) Process Using Non-Parametric Bayesian Clustering and Evidence Theory [J] . O. F. Beyca, P. K. Rao, Z. Kong, IEEE transactions on automation science and engineering . 2016 ,第2期

机译：基于非参数贝叶斯聚类和证据理论的超精密加工（UPM）过程实时监控的异构传感器数据融合方法
4. Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach [C] . Tiehang Duan, Qi Lou, Sargur N. Srihari, Pacific-Asia Conference on Knowledge Discovery and Data Mining . 2019

机译：顺序嵌入诱导的文本聚类，非参数贝叶斯方法
5. Bayesian multi-task learning for clustering and classification with non-parametric priors. [D] . An, Qi. 2008

机译：贝叶斯多任务学习，用于使用非参数先验进行聚类和分类。
6. A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects [O] . Murat Dundar, Ferit Akova, Halid Z Yerebakan, 2014

机译：用于联合细胞聚类和聚类匹配的非参数贝叶斯模型：识别具有随机效应的异常样本表型
7. Temporally Coherent CRP: A Bayesian Non-Parametric Approach for Clustering Tracklets with applications to Person Discovery in Videos [O] . Adway Mitra, Soma Biswas, Chiranjib Bhattacharyya 2015

机译：临时相干的CRP：一个贝叶斯非参数方法，用于将播放器与应用程序的群集群体发现

Sequential Embedding Induced Text Clustering, a Non-parametric Bayesian Approach

摘要

著录项

相似文献

相关主题

期刊订阅