Short Text Feature Extraction and Clustering for Web Topic Mining

机译：Web主题挖掘的短文本特征提取和聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.

机译：本文介绍了一种算法，以纳入中国短文本的基于中国块的挖掘网站。针对中文短文本的特点，该算法采用n-gram特征提取来捕获文本中的汉块，反映文本语义结构和字符依赖。然后，RPCL算法应用于实现具有高精度的文本群集，这不需要知道确切的群集数。最后，实验结果表明，这种方法可以显着降低维度，有效地提高中国短文本聚类的性能而不是传统方法。

著录项

来源
《International Conference on Semantics, Knowledge and Grid》|2007年||共4页
会议地点
作者
Hui He; Bo Chen; Weiran Xu; Jun Guo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.1-53;
关键词

相似文献

外文文献
中文文献
专利

1. A Short-Text Oriented Clustering Method for Hot Topics Extraction [J] . Yan Zheng, Zhaopeng Meng, Chao Xu International journal of software engineering and knowledge engineering . 2015,第3期

机译：一种面向短文本的热点话题聚类方法
2. Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm [J] . Wu Di, Yang Ruixin, Shen Chao Journal of Intelligent Information Systems . 2021,第1期

机译：情绪字共有和知识对特征提取基于LDA短文本聚类算法
3. Fuzzy topic modeling approach for text mining over short text [J] . Rashid Junaid, Shah Syed Muhammad Adnan, Irtaza Aun Information Processing & Management . 2019,第6期

机译：短文本文本挖掘的模糊主题建模方法
4. Short Text Feature Extraction and Clustering for Web Topic Mining [C] . Hui He, rnBo Chen, rnWeiran Xu, Third international conference on semantics, knowledge, and grid (SKG 2007) . 2007

机译：Web主题挖掘的短文本特征提取和聚类
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction [O] . Martin Krallinger, Carlos Rodriguez-Penagos, Ashish Tendulkar, 2009

机译：PLAN2L：用于集成文本挖掘和基于文献的生物实体关系提取的Web工具
7. Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering [O] . Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, 2019

机译：通过混合逆文档频率和模糊k叶片频率和模糊k型群体挖掘生物医学文本语料主题建模技术

Short Text Feature Extraction and Clustering for Web Topic Mining

摘要

著录项

相似文献

相关主题

期刊订阅