首页> 外文会议>Third international conference on semantics, knowledge, and grid (SKG 2007) >Short Text Feature Extraction and Clustering for Web Topic Mining
【24h】

Short Text Feature Extraction and Clustering for Web Topic Mining

机译:Web主题挖掘的短文本特征提取和聚类

获取原文

摘要

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.
机译:本文将介绍一种基于中文分块的中文短文本聚类算法,以挖掘网络主题。针对中文短文本的特点,该算法采用N元语法特征提取从文本中捕获中文大块,反映了文本的语义结构和字符依赖性。然后将RPCL算法应用于高精度的文本聚类,不需要知道确切的聚类数。最后,实验结果表明,与传统方法相比,该方法可以显着降低维数,有效提高中文短文本聚类的性能。

著录项

  • 来源
  • 会议地点 Xian(CN);Xian(CN)
  • 作者单位

    School of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

    rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

    rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

    rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

  • 会议组织
  • 原文格式 PDF
  • 正文语种
  • 中图分类 计算技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号