Distributed Document and Phrase Co-embeddings for Descriptive Clustering

机译：分布式文档和短语共嵌入用于描述性集群

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Descriptive document clustering aims to automatically discover groups of seman-tically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering ap proach that employs a distributed repre sentation model, namely the paragraph vector model, to capture semantic similar ities between documents and phrases. The proposed method uses a joint representa tion of phrases and documents (i.e., a co-embedding) to automatically select a de scriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an ex isting state-of-the-art descriptive cluster ing method that also uses co-embedding but relies on a bag-of-words represen tation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior per formance over the existing approach in both identifying clusters and assigning ap propriate descriptive labels to them.

机译：描述性文档群集旨在自动发现有关相关文档的组，并分配有意义的标签以表征每个群集的内容。在本文中，我们介绍了一种描述性聚类AP Proach，该AP Proach采用了分布式代表发送模型，即段落向量模型，以捕获文档和短语之间的语义相似的概述。所提出的方法使用关节代表短语和文档（即，共同嵌入）来自动选择最能代表每个文档群集的De脚本短语。我们通过将其性能与EX的性能进行比较来评估我们的方法，该方法也使用共同嵌入，而是依赖于单词袋时代的群体。基准数据集获得的结果表明，基于段落的方法在识别群集中的现有方法中获得了优越的每种格式，并将AP推动描述性标签分配给它们。

著录项

来源
《Conference of the European Chapter of the Association for Computational Linguistics》|2017年|xxxviii p. 643-1280|共11页
会议地点
作者
Motoki Sato; Austin J. Brockmeier; Georgios Kontonatsios; Tingting Mu; John Y. Goulermas; Junichi Tsujii; Sophia Ananiadou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Patent Issued for Phrase-Based Document Clustering with Automatic Phrase Extraction [J] . Journal of Engineering . 2013,第12期

机译：具有自动短语提取功能的基于短语的文档聚类已颁发专利
2. Charismatic Document Clustering Through Novel K-Means Non-negative Matrix Factorization (KNMF) Algorithm Using Key Phrase Extraction [J] . E. Laxmi Lydia, P. Krishna Kumar, K. Shankar, International journal of parallel programming . 2020,第3期

机译：通过新颖的K-Mean非负矩阵分解（KNMF）算法使用关键短语提取的魅力文档聚类
3. A Novel Weighted Phrase-Based Similarity for Web Documents Clustering [J] . Ruilong Yang, Qingsheng Zhu, Yunni Xia Journal of software . 2011,第8期

机译：Web文档聚类的一种新型的基于加权短语的相似度
4. Distributed Document and Phrase Co-embeddings for Descriptive Clustering [C] . Motoki Sato, Austin J. Brockmeier, Georgios Kontonatsios, Conference of the European Chapter of the Association for Computational Linguistics . 2017

机译：用于描述性聚类的分布式文档和短语共嵌入
5. Clustering Web documents: A phrase-based method for grouping search engine results. [D] . Zamir, Oren Eli. 1999

机译：Web文档群集：一种基于短语的方法，用于对搜索引擎结果进行分组。
6. Terminology spectrum analysis of natural-language chemical documents: term-like phrases retrieval routine [O] . Boris L. Alperin, Andrey O. Kuzmin, Ludmila Yu. Ilina, 2016

机译：天然语言化学文献的术语谱分析：类词短语检索例程
7. Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections [O] . Helena Ahonen, Oskari Heinonen, Mika Klemettinen, 1998

机译：数据挖掘技术在数字文档收集中的描述性短语提取中的应用

Distributed Document and Phrase Co-embeddings for Descriptive Clustering

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅