Research on Clustering Technology Based on Co-occurrence Word Model for Tibetan Microblog

机译：基于藏微博共发生词模型的聚类技术研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to solve the two key problems of the short text classification, very sparse features and strong context dependency, this paper proposed clustering technology based on Co-Occurrence Word Model for Tibetan microblog. The Tibetan news corpus (standard long text) as the experimental training corpus, using LDA model to construct the co-occurrence word network which related to one theme. According to the co-occurrence network relationship to determine the attribution of Tibetan microblog short text, so as to solve the problem of short text lack of semantic correlation and data sparseness, finally, we using K-means++ clustering algorithm to cluster. By comparing the results of three clustering methods, experimental results show that this method has obvious effect on the microblog Tibetan text clustering, and accuracy reached 88.06%.

机译：为了解决短文本分类的两个关键问题，非常稀疏的特征和强大的上下文依赖性，本文提出了基于藏微博的共同发生词模型的聚类技术。西藏新闻语料库（标准长篇文本）作为实验培训语料库，使用LDA模型构建与一个主题相关的共同发生词网络。根据共同发生网络关系来确定藏族微博短文本的归因，以解决短篇文本问题缺少语义相关性和数据稀疏的问题，最后，我们将k-means ++聚类算法群集群集。通过比较三种聚类方法的结果，实验结果表明，该方法对微博藏文本聚类具有明显影响，精度达到88.06％。

著录项

来源
《International Conference on Intelligent Human-Machine Systems and Cybernetics》|2016年|597p|共5页
会议地点
作者
Ailin Li; Tao Jiang; Qingshuai Wang; Hongzhi Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Probability distribution; Clustering algorithms; Standards; Semantics; Data mining; Vocabulary;

机译：概率分布;聚类算法;标准;语义;数据挖掘;词汇;

相似文献

外文文献
中文文献
专利

1. Using Hashtag Graph-Based Topic Model to Connect Semantically-Related Words Without Co-Occurrence in Microblogs [J] . Yuan Wang, Jie Liu, Yalou Huang, IEEE Transactions on Knowledge and Data Engineering . 2016,第7期

机译：使用基于Hashtag图的主题模型连接语义相关的单词，而无需在微博中同时出现
2. Tibetan Microblog Emotional Analysis Based on Sequential Model in Online Social Platforms [J] . Qiu Lirong, Zhang Huili, Zhang Zhen, Complexity . 2017,第1期

机译：在线社交平台中基于顺序模型的藏族微博情感分析
3. Tibetan Microblog Emotional Analysis Based on Sequential Model in Online Social Platforms [J] . Qiu Lirong, Zhang Huili, Zhang Zhen, Complexity . 2017,第1期

机译：在线社交平台中基于顺序模型的藏族微博情感分析
4. Research on Clustering Technology Based on Co-occurrence Word Model for Tibetan Microblog [C] . Ailin Li, Tao Jiang, Qingshuai Wang, International Conference on Intelligent Human-Machine Systems and Cybernetics . 2016

机译：基于共现词模型的藏族微博聚类技术研究
5. Microblog search and word clouds: The impact of word clouds on user satisfaction during microblog searches. [D] . Haber, Jonathan. 2010

机译：微博客搜索和词云：在微博客搜索期间，词云对用户满意度的影响。
6. Microblog Topic-Words Detection Model for Earthquake Emergency Responses Based on Information Classification Hierarchy [O] . Xiaohui Su, Shurui Ma, Xiaokang Qiu, 2021

机译：基于信息分类层次结构的地震应急响应的微博主题词检测模型
7. Language clustering with word co-occurrence networks based on parallel texts [O] . HaiTao Liu, Jin Cong 2013

机译：基于并行文本的单词共现网络的语言聚类

Research on Clustering Technology Based on Co-occurrence Word Model for Tibetan Microblog

摘要

著录项

相似文献

相关主题

期刊订阅