首页> 外文期刊>Mathematical Problems in Engineering >A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network
【24h】

A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

机译:基于深度学习词汇网络的文本聚类新方法

获取原文
获取原文并翻译 | 示例
           

摘要

Text clustering is an effective approach to collect and organize text documents into meaningful groups for mining valuable information on the Internet. However, there exist some issues to tackle such as feature extraction and data dimension reduction. To overcome these problems, we present a novel approach named deep-learning vocabulary network. The vocabulary network is constructed based on related-word set, which contains the " cooccurrence" relations of words or terms. We replace term frequency in feature vectors with the " importance" of words in terms of vocabulary network and PageRank, which can generate more precise feature vectors to represent the meaning of text clustering. Furthermore, sparse-group deep belief network is proposed to reduce the dimensionality of feature vectors, and we introduce coverage rate for similarity measure in Single-Pass clustering. To verify the effectiveness of our work, we compare the approach to the representative algorithms, and experimental results show that feature vectors in terms of deep-learning vocabulary network have better clustering performance.
机译:文本聚类是一种有效的方法,可以将文本文档收集和组织为有意义的组,以在Internet上挖掘有价值的信息。但是,存在一些要解决的问题,例如特征提取和数据维数减少。为了克服这些问题,我们提出了一种称为深度学习词汇网络的新颖方法。词汇网络是基于相关单词集构建的,其中包含单词或术语的“同现”关系。我们用词汇网络和PageRank方面的单词“重要性”替换了特征向量中的术语频率,后者可以生成更精确的特征向量来表示文本聚类的含义。此外,提出了一种稀疏组深度置信网络以减少特征向量的维数,并在单遍聚类中引入了相似性度量的覆盖率。为了验证我们工作的有效性,我们将该方法与代表性算法进行了比较,实验结果表明,就深度学习词汇网络而言,特征向量具有更好的聚类性能。

著录项

  • 来源
    《Mathematical Problems in Engineering》 |2017年第2017期|8310934.1-8310934.13|共13页
  • 作者单位

    Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China|China Informat Technol Secur Evaluat Ctr, Beijing 100085, Peoples R China;

    Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China;

    China Informat Technol Secur Evaluat Ctr, Beijing 100085, Peoples R China;

    Beijing Univ Chem Technol, Coll Informat Sci & Technol, Beijing 100029, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号