Short Text Feature Extraction and Clustering for Web Topic Mining

机译：Web主题挖掘的短文本特征提取和聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.

机译：本文将介绍一种基于中文分块的中文短文本聚类算法，以挖掘网络主题。针对中文短文本的特点，该算法采用N元语法特征提取从文本中捕获中文大块，反映了文本的语义结构和字符依赖性。然后将RPCL算法应用于高精度的文本聚类，不需要知道确切的聚类数。最后，实验结果表明，与传统方法相比，该方法可以显着降低维数，有效提高中文短文本聚类的性能。

著录项

来源
《Third international conference on semantics, knowledge, and grid (SKG 2007)》|2007年|1-4|
会议地点 Xian(CN);Xian(CN)
作者
Hui He; rnBo Chen; rnWeiran Xu; rnJun Guo;
展开▼
作者单位

School of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

rnSchool of Information Engineering,Beijing University of Posts and Telecommunications Beijing, P.R. China, 100876;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A Short-Text Oriented Clustering Method for Hot Topics Extraction [J] . Yan Zheng, Zhaopeng Meng, Chao Xu International journal of software engineering and knowledge engineering . 2015,第3期

机译：一种面向短文本的热点话题聚类方法
2. Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm [J] . Wu Di, Yang Ruixin, Shen Chao Journal of Intelligent Information Systems . 2021,第1期

机译：情绪字共有和知识对特征提取基于LDA短文本聚类算法
3. Fuzzy topic modeling approach for text mining over short text [J] . Rashid Junaid, Shah Syed Muhammad Adnan, Irtaza Aun Information Processing & Management . 2019,第6期

机译：短文本文本挖掘的模糊主题建模方法
4. Short Text Feature Extraction and Clustering for Web Topic Mining [C] . Hui He, Bo Chen, Weiran Xu, International Conference on Semantics, Knowledge and Grid . 2007

机译：Web主题挖掘的短文本特征提取和聚类
5. Topic Modeling and Spam Detection for Short Text Segments in Web Forums [D] . Sun, Yingcheng. 2020

机译：网上论坛中短文本段的主题建模和垃圾邮件检测
6. PLAN2L: a web tool for integrated text mining and literature-based bioentity relation extraction [O] . Martin Krallinger, Carlos Rodriguez-Penagos, Ashish Tendulkar, 2009

机译：PLAN2L：用于集成文本挖掘和基于文献的生物实体关系提取的Web工具
7. Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering [O] . Junaid Rashid, Syed Muhammad Adnan Shah, Aun Irtaza, 2019

机译：通过混合逆文档频率和模糊k叶片频率和模糊k型群体挖掘生物医学文本语料主题建模技术

Short Text Feature Extraction and Clustering for Web Topic Mining

摘要

著录项

相似文献

相关主题

期刊订阅