首页> 外文会议>Conference of the European Chapter of the Association for Computational Linguistics >Distributed Document and Phrase Co-embeddings for Descriptive Clustering
【24h】

Distributed Document and Phrase Co-embeddings for Descriptive Clustering

机译:分布式文档和短语共嵌入用于描述性集群

获取原文
获取外文期刊封面目录资料

摘要

Descriptive document clustering aims to automatically discover groups of seman-tically related documents and to assign a meaningful label to characterise the content of each cluster. In this paper, we present a descriptive clustering ap proach that employs a distributed repre sentation model, namely the paragraph vector model, to capture semantic similar ities between documents and phrases. The proposed method uses a joint representa tion of phrases and documents (i.e., a co-embedding) to automatically select a de scriptive phrase that best represents each document cluster. We evaluate our method by comparing its performance to an ex isting state-of-the-art descriptive cluster ing method that also uses co-embedding but relies on a bag-of-words represen tation. Results obtained on benchmark datasets demonstrate that the paragraph vector-based method obtains superior per formance over the existing approach in both identifying clusters and assigning ap propriate descriptive labels to them.
机译:描述性文档群集旨在自动发现有关相关文档的组,并分配有意义的标签以表征每个群集的内容。在本文中,我们介绍了一种描述性聚类AP Proach,该AP Proach采用了分布式代表发送模型,即段落向量模型,以捕获文档和短语之间的语义相似的概述。所提出的方法使用关节代表短语和文档(即,共同嵌入)来自动选择最能代表每个文档群集的De脚本短语。我们通过将其性能与EX的性能进行比较来评估我们的方法,该方法也使用共同嵌入,而是依赖于单词袋时代的群体。基准数据集获得的结果表明,基于段落的方法在识别群集中的现有方法中获得了优越的每种格式,并将AP推动描述性标签分配给它们。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号