首页> 外文会议>International Conference on Intelligent Human-Machine Systems and Cybernetics >Research on Clustering Technology Based on Co-occurrence Word Model for Tibetan Microblog
【24h】

Research on Clustering Technology Based on Co-occurrence Word Model for Tibetan Microblog

机译:基于藏微博共发生词模型的聚类技术研究

获取原文

摘要

In order to solve the two key problems of the short text classification, very sparse features and strong context dependency, this paper proposed clustering technology based on Co-Occurrence Word Model for Tibetan microblog. The Tibetan news corpus (standard long text) as the experimental training corpus, using LDA model to construct the co-occurrence word network which related to one theme. According to the co-occurrence network relationship to determine the attribution of Tibetan microblog short text, so as to solve the problem of short text lack of semantic correlation and data sparseness, finally, we using K-means++ clustering algorithm to cluster. By comparing the results of three clustering methods, experimental results show that this method has obvious effect on the microblog Tibetan text clustering, and accuracy reached 88.06%.
机译:为了解决短文本分类的两个关键问题,非常稀疏的特征和强大的上下文依赖性,本文提出了基于藏微博的共同发生词模型的聚类技术。西藏新闻语料库(标准长篇文本)作为实验培训语料库,使用LDA模型构建与一个主题相关的共同发生词网络。根据共同发生网络关系来确定藏族微博短文本的归因,以解决短篇文本问题缺少语义相关性和数据稀疏的问题,最后,我们将k-means ++聚类算法群集群集。通过比较三种聚类方法的结果,实验结果表明,该方法对微博藏文本聚类具有明显影响,精度达到88.06%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号