首页> 外文会议>Asian conference on intelligent information and database systems >Word Mover's Distance for Agglomerative Short Text Clustering
【24h】

Word Mover's Distance for Agglomerative Short Text Clustering

机译:聚集短文本聚类的词移动距离

获取原文

摘要

In the era of information overload, text clustering plays an important, part, in the analysis processing pipeline. Partitioning high-quality texts into unseen categories tremendously helps applications in information retrieval, databases, and business intelligence domains. Short texts from social media environment such as tweets, however, remain difficult to interpret due to the broad aspects of contexts. Traditional text similarity approaches only rely on the lexical matching while ignoring the semantic meaning of words. Recent advances in distributional semantic space have opened an alternative approach in utilizing high-quality word embeddings to aid the interpretation of text semantics. In this paper, we investigate the word mover's distance metrics to automatically cluster short text using the word semantic information. We utilize the agglomerative strategy as the clustering method to efficiently group texts based on their similarity. The experiment indicates the word mover's distance outperformed other standard metrics in the short text clustering task.
机译:在信息过载的时代,文本聚类在分析处理流程中起着重要的作用。将高质量的文本分为看不见的类别极大地帮助了信息检索,数据库和商业智能领域中的应用程序。然而,由于上下文的广泛方面,社交媒体环境中的短文本(例如推文)仍然难以解释。传统的文本相似性方法仅依靠词法匹配,而忽略了单词的语义。分布语义空间的最新进展为利用高质量词嵌入技术来辅助解释文本语义开辟了另一种方法。在本文中,我们研究了单词移动器的距离度量,以使用单词语义信息自动将短文本聚类。我们利用凝聚策略作为聚类方法,基于文本的相似性有效地对文本进行分组。实验表明,在短文本聚类任务中,单词移动器的距离优于其他标准度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号