首页> 外文会议>International Conference on Semantics, Knowledge and Grid >Short Text Feature Extraction and Clustering for Web Topic Mining
【24h】

Short Text Feature Extraction and Clustering for Web Topic Mining

机译:Web主题挖掘的短文本特征提取和聚类

获取原文

摘要

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.
机译:本文介绍了一种算法,以纳入中国短文本的基于中国块的挖掘网站。针对中文短文本的特点,该算法采用n-gram特征提取来捕获文本中的汉块,反映文本语义结构和字符依赖。然后,RPCL算法应用于实现具有高精度的文本群集,这不需要知道确切的群集数。最后,实验结果表明,这种方法可以显着降低维度,有效地提高中国短文本聚类的性能而不是传统方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号